Hacker News: ggnore7452

New comment by ggnore7452 in "Google releases Gemma 4 open models"

ggnore7452 — Fri, 03 Apr 2026 07:13:55 +0000

too bad that only the smaller on-device models support native audio input.

New comment by ggnore7452 in "Google AI Overviews cite YouTube more than any medical site for health queries"

ggnore7452 — Mon, 26 Jan 2026 20:41:41 +0000

imo, for health related stuff. or most of the general knowledge doesn't require latest info after 2023. the internal knowledge of LLM is so much better than the web search augmented one.

New comment by ggnore7452 in "Google co-founder reveals that "many" of the new hires do not have a degree"

ggnore7452 — Tue, 20 Jan 2026 22:23:42 +0000

I’m fine with hires without degrees. But if Google still filters people with LeetCode style coding questions, what’s the point of that in this day and age?

New comment by ggnore7452 in "Why our website looks like an operating system"

ggnore7452 — Fri, 12 Sep 2025 04:48:43 +0000

so immersive i actually hit ctrl+w and closed the whole tab.

New comment by ggnore7452 in "Llama-Scan: Convert PDFs to Text W Local LLMs"

ggnore7452 — Mon, 18 Aug 2025 02:16:27 +0000

I’ve done a similar PDF → Markdown workflow.

For each page:

- Extract text as usual.

- Capture the whole page as an image (~200 DPI).

- Optionally extract images/graphs within the page and include them in the same LLM call.

- Optionally add a bit of context from neighboring pages.

Then wrap everything with a clear prompt (structured output + how you want graphs handled), and you’re set.

At this point, models like GPT-5-nano/mini or Gemini 2.5 Flash are cheap and strong enough to make this practical.

Yeah, it’s a bit like using a rocket launcher on a mosquito, but this is actually very easy to implement and quite flexible and powerfuL. works across almost any format, Markdown is both AI and human friendly, and surprisingly maintainable.

New comment by ggnore7452 in "A new ChatGPT version just dropped and GeoGuesser is now a solved problem"

ggnore7452 — Fri, 18 Apr 2025 00:48:05 +0000

I’ve been using LLMs for this kind of geo-guessing since Gemini 2.0. Even without access to internet search like o3, they perform surprisingly well.

New comment by ggnore7452 in "New tools for building agents"

ggnore7452 — Tue, 11 Mar 2025 18:29:13 +0000

appreciate the question on hparams for websearch!

one of the main reasons i build these ai search tools from scratch is that i can fully control the depth and breadth (and also customize loader to whatever data/sites). and currently the web search isn't very transparent on what sites they do not have full text or just use snippets.

having computer use + websearch is definitely something very powerful (openai's deep research essentially)

New comment by ggnore7452 in "Open Source Voice Cloning at Its Best"

ggnore7452 — Sun, 16 Feb 2025 01:37:52 +0000

How’s this compare to likes of Fish audio? Wish they support voice clone using longer audio tho .

Haven’t looked into this space for few months , but iirc, previously SOTA was like GPT VITS or something ?

New comment by ggnore7452 in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

ggnore7452 — Thu, 09 Jan 2025 22:20:06 +0000

anyone tried this? is this actually overall better than xgboost/catboost?

New comment by ggnore7452 in "Embeddings are underrated"

ggnore7452 — Fri, 01 Nov 2024 16:16:08 +0000

if anything i would consider embeddings bit overrated, or it is safer to underrate them.

They're not the silver bullet many initially hoped for, they're not a complete replacement for simpler methods like BM25. They only have very limited "semantic understanding" (and as people throw increasingly large chunks into embedding models, the meanings can get even fuzzier)

Overly high expectations lets people believe that embeddings will retrieve exactly what they mean, and With larger top-k values and LLMs that are exceptionally good at rationalizing responses, it can be difficult to notice mismatches unless you examine the results closely.

New comment by ggnore7452 in "SimpleQA"

ggnore7452 — Wed, 30 Oct 2024 21:16:07 +0000

What’s more interesting to me here are the calibration graphs:

• LLMs, at least GPT models, tend to overstate their confidence. • A frequency-based approach appears to achieve calibration closer to the ideal.

This kinda passes my vibe test. That said, I wonder—rather than running 100 trials, could we approximate this by using something like a log-probability ratio? This would especially apply in cases where answers are yes or no, assuming the output spans more than one token.

New comment by ggnore7452 in "Show HN: Claude Memory – Long-term memory for Claude"

ggnore7452 — Thu, 05 Sep 2024 20:48:29 +0000

Side note: I feel like ChatGPT's long-term memory isn't implemented properly. If you check the 'saved memories,' they are just bad.

New comment by ggnore7452 in "CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data"

ggnore7452 — Thu, 25 Apr 2024 19:29:12 +0000

question: any good on-device size image embedding models?

tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?

New comment by ggnore7452 in "Show HN: I've built a locally running Perplexity clone"

ggnore7452 — Thu, 04 Apr 2024 04:33:07 +0000

I've been working on a small personal project similar to this and agree that replicating the overall experience provided by Perplexity.ai, or even improving it for personal use, isn't that challenging. (The concerns of scale or cost are less significant in personal projects. Perplexity doesn't do too much planning or query expansion, nor does it dig super deep into the sources afaik)

I must say, though, that they are doing a commendable job integrating sources like YouTube and Reddit. These platforms benefit from special preprocessing and indeed add value.

New comment by ggnore7452 in "Groq runs Mixtral 8x7B-32k with 500 T/s"

ggnore7452 — Mon, 19 Feb 2024 20:22:02 +0000

more on the LPU and data center: https://wow.groq.com/lpu-inference-engine/

price and speed benchmark: https://wow.groq.com/

New comment by ggnore7452 in "Groq runs Mixtral 8x7B-32k with 500 T/s"

ggnore7452 — Mon, 19 Feb 2024 20:08:25 +0000

The Groq demo was indeed impressive. I work with LLM alot in work, and a generation speed of 500+ tokens/s would definitely change how we use these products. (Especially considering it's an early-stage product)

But the "completely novel silicon architecture" and the "self-developed LPU" (claiming not to use GPUs)... makes me bit skeptical. After all, pure speed might be achievable through stacking computational power and model quantization. Shouldn't innovation at the GPU level be quite challenging, especially to achieve such groundbreaking speeds?

New comment by ggnore7452 in "LangChain Announces 10M Seed Round"

ggnore7452 — Tue, 04 Apr 2023 22:26:37 +0000

it still works well with other LLM (like llama and more).

various small, open sourced, and verticle LLMs vs one large GPT models would be quite interesting.

New comment by ggnore7452 in "LangChain Announces 10M Seed Round"

ggnore7452 — Tue, 04 Apr 2023 17:43:36 +0000

The place where I work was an early adopter of LLM, having started working on it a year ago.

When I build stuff with GPT-3, especially in the earlier days, I get the strong impression that it's like we are doing machine learning without Numpy and Pandas.

with LangChain, many of the systems I have built can be done in just one or two lines, making life much easier for rest of us. I also believe that LangChain's Agent framework is underappreciated as it was pretty ahead of its time until the official ChatGPT plugins were released. (contributed to LangChain a bit too.)

Unfortunately, the documentation is lacking indeed. While I understand the need to move quickly, it is not good that some crucial concepts like Customized LLM have inadequate documentation. (Perhaps having some LLM builds on top of the repo would be more effective than documentation at this point.)

New comment by ggnore7452 in "Show HN: Summate.it – Quickly summarize web articles with OpenAI"

ggnore7452 — Wed, 18 Jan 2023 09:43:38 +0000

there is actually a paper by OpenAI themselves on summarizing long document. essentially, break a longer text into smaller chunks, and run a multi-stage sequential summarization. each chunk uses a trailing window of previous chunk as context, and run this recursively. https://arxiv.org/abs/2109.10862

did a rough implementation myself, works well for articles even 20k tokens. but kind slow because all the additional overlapping runs required. (and more costly)

New comment by ggnore7452 in "Notion AI – waiting list signup"

ggnore7452 — Wed, 16 Nov 2022 18:44:06 +0000

I use large language model for work and use Notion daily.

while I like the "AI" part (the large language model), think it would be more interesting and productive to use same backend for full text semantic search & question answering or summarizations.

But it is cool to see Notion trying this way, kinda curious to see the results when so many people have access to this type of generative model.