Hacker News: kherud

New comment by kherud in "Claude Mythos: The System Card"

kherud — Mon, 13 Apr 2026 18:19:10 +0000

LLMs are extremely capable at problem solving. Presumably because you can autonomously learn a lot of it. But can you somehow account for things like long-term maintainability and code quality (whatever that means) or do you always have to rely on either existing high-quality code-bases (pre-training) or human curated datasets? Since you can't really quantify these properties (as opposed to: the problem is either solved or not), does this restrict autonomous improvement in this area? Are there benchmarks that consider this? Could Claude Mythos create an ultra-quality version of Claude Code or would it still produce something similar to earlier models, which are already over-sufficient in individual problem solving capability.

New comment by kherud in "Music for Programming"

kherud — Mon, 06 Apr 2026 09:14:54 +0000

If I'd have to make one recommendation it's David August's Boiler Room set [1]. It has such a coherent flow through the whole set, it makes me fly through multiple hours if not days of work.

[1] https://www.youtube.com/watch?v=mRfwdJx0NDE

New comment by kherud in "Modern SQLite: Features You Didn't Know It Had"

kherud — Thu, 02 Apr 2026 19:20:55 +0000

SQLite seems very powerful for building FTS (user enters free text, expects high precision/recall results). Still, I feel like it's non-trivial to get good search quality.

I think the naive approach is to tokenize the input and append "*" for prefix matching. I'm not too experienced and this can probably be improved a lot. There are many settings like different tokenizers, stemming, etc. Additionally, a lot can be built on top like weighting, boosting exact matches, etc.

Does anyone know good resources for this to learn and draw inspiration from?

New comment by kherud in "39th Chaos Communication Congress Videos"

kherud — Fri, 02 Jan 2026 14:07:18 +0000

One interesting detail: In previous years, Joscha Bach gave a talk on AI, consciousness, and related topics (see e.g. [0]). A similar talk was planned for this year as well, but after emails between him and Epstein were made public (see his comment on this in [1]), his talk was canceled. Instead, there appears to have been an event that critically addressed the situation [2]. Unfortunately it was not recorded. Did anyone attend? A discussion between Joscha and his critics would have been really interesting.

[0] https://media.ccc.de/v/38c3-self-models-of-loving-grace

[1] https://joscha.substack.com/p/on-the-jeffrey-epstein-affair

[2] https://events.ccc.de/congress/2025/hub/en/event/detail/tech...

New comment by kherud in "Antlr-Ng Parser Generator"

kherud — Sat, 13 Sep 2025 08:58:44 +0000

I'm a fan of antlr-ng. It's a solid upgrade if you're already using antlr. In my experience, they're fully compatible. antlr's ALL(*) parsing is relatively powerful for a parser generator, but it lacks support for incremental parsing. antlr-ng might improve things enough to be usable interactively in smaller settings, even if you need to reparse the document each time. It also comes with useful extensions like https://github.com/mike-lischke/antlr4-c3, which generates syntactic and semantic completions directly from the grammar.

New comment by kherud in "What does Palantir actually do?"

kherud — Thu, 14 Aug 2025 21:48:09 +0000

It's probably "Reflections on Palantir" https://news.ycombinator.com/item?id=41855006

New comment by kherud in "Preventing Flash of Incomplete Markdown when streaming AI responses"

kherud — Wed, 04 Jun 2025 18:10:16 +0000

Is there a general solution to this problem? I assume you can only start buffering tokens once you see a construct, for which there are continuations, that once completed, would lead to the previous text being rendered differently. Of course you don't want to keep buffering for too long, since this would defeat the purpose of streaming. And you never know if the potential construct will actually be generated. Also, the solution probably has to be more context sensitive. For example, within code blocks, you'll never want to render links for []() constructs.

EDIT: One library I found is https://github.com/thetarnav/streaming-markdown which seems to combine incremental parsing with optimistic rendering, which works good enough in practice, I guess.

New comment by kherud in "The Legend of Holy Sword: An Immersive Experience for Concentration Enhancement"

kherud — Sat, 14 Sep 2024 06:53:16 +0000

That was my association as well! Dune even uses similar vocabulary. For example someone mentioned "pranayama" in this thread, which sounds a lot like Dune's "Prana-bindu". Really makes me wonder about Frank Herbert's experiences about all of this.

New comment by kherud in "Learning to Reason with LLMs"

kherud — Thu, 12 Sep 2024 19:35:04 +0000

Aren't LLMs much more limited on the amount of output tokens than input tokens? For example, GPT-4o seems to support only up to 16 K output tokens. I'm not completely sure what the reason is, but I wonder how that interacts with Chain-of-Thought reasoning.

New comment by kherud in "Features I'd like to see in future IDEs"

kherud — Fri, 09 Aug 2024 19:05:41 +0000

Maybe you're already aware of it, but there is difftastic [0], which is a syntax aware diff tool that can also be used with git. Its understanding of syntax is based on treesitter, so it works for most languages. Although I haven't tried, I think most IDEs should also be able to use it.

[0] https://difftastic.wilfred.me.uk/

New comment by kherud in "A new type of neural network is more interpretable"

kherud — Mon, 05 Aug 2024 18:24:49 +0000

Interesting, thanks for sharing! Do you have an explanation or idea why compilation slows some architectures down?

New comment by kherud in "Can reading make you happier? (2015)"

kherud — Sun, 04 Aug 2024 07:51:04 +0000

I think it's more about invested work vs. reward.

Mindless browsing is one of the lowest work activities, but the influx of information is highly rewarding for the brain. That's why it's so addicting. Programming and OS installation are more work, but there is direct progress. Filing taxes is just work, but at again it's a very direct way to feel productive. All of these activities are immediately rewarding.

Reading on the other hand requires a lot of concentration, without much immediate reward. And I think the ratio here is highly subjective for most people.

New comment by kherud in "Data Fetching for Single-Page Apps"

kherud — Mon, 22 Jul 2024 19:20:34 +0000

Let's say you want to show a modal, which fetches some data and modifies the state. Based on this, new children are rendered which again fetch state. The problem of "spaghetti fetching" becomes worse the more levels of recursive fetching there are. If I understand you correctly, you argue for fetching all data upfront, and then rendering the modal and all its children all at once. This way you ensure "UI = f(state)" by removing side effects from "f".

On the other hand, I can also see some drawbacks:

  1. This goes against the idea of fetching data close to where it's used, basically promoting modularization.
  2. From the POV of the children, you have to backtrack where their data are coming from.
  3. If components always use the same data, you have to duplicate fetching their data everywhere you want to use them.
  4. You can't partially show children, but have to wait for everyone to have their data before rendering them.

I feel like there are trade-offs to be made here.

Does GPT-4o use OCR for vision?

kherud — Fri, 12 Jul 2024 15:01:47 +0000

Article URL: https://kherud.github.io/blog/gpt4o-vision-ocr/

Comments URL: https://news.ycombinator.com/item?id=40946225

Points: 1

# Comments: 0

New comment by kherud in "How Does GPT-4o Encode Images?"

kherud — Fri, 07 Jun 2024 14:06:50 +0000

Shouldn't this theory be testable? The response time for an image of the same size should remain constant (assuming a generated response of constant size). You could then try to put an increasing amount of text inside of the image. If this text is fed to the LLM using OCR, the total amount of tokens grows. You should then be able to observe an increase in response time.

New comment by kherud in "Show HN: Open-source load balancer for llama.cpp"

kherud — Sun, 02 Jun 2024 11:46:49 +0000

I think this comment explains it https://github.com/ggerganov/llama.cpp/discussions/4130#disc... As far as I understand (and mcharytoniuk should better confirm this), llama.cpp allows to chunk the context window of an LLM into independent blocks, such that multiple requests can be processed in a single inference. I think due to the auto-regressive nature of LLMs, you also don't have to wait for all sequences to finish to output them. As soon as one sequence finishes, you can use its "slot" in the context window for other requests.

New comment by kherud in "Gemini Flash"

kherud — Tue, 14 May 2024 20:55:53 +0000

Now that context length seems abundant for most tasks, I'm wondering why sub-word tokens are still used. I'm really curious how character-based LLMs would compare. With 2 M context, the compute bottleneck fades away. I'm not sure though what role the vocabulary size has. Maybe a large size is critical, since the embedding already contains a big chunk of the knowledge. On the other hand, using a character-based vocabulary would solve multiple problems, I think, like glitch tokens and possibly things like arithmetic and rhyming capabilities. Implementing sub-word tokenizers correctly and training them seems also quite complex. On a character level this should be trivial.

New comment by kherud in "Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion"

kherud — Wed, 13 Sep 2023 10:48:12 +0000

Thank you for sharing! On a tangent: I'm wondering if there are any good open source models/libraries to reconstruct audio quality. I'm thinking about an end-to-end open source alternative to something like Adobe Podcast [1] to make noisy recordings sound professional. Anecdotally it's supposed to be very good. In a recent search, I haven't found anything convincing. In my naive view this tasks seems much simpler than audio generation and the demand far bigger, since not everyone has a professional audio setup ready at all times.

[1] https://podcast.adobe.com/

New comment by kherud in "GPT-3.5/4 response times are linear with output tokens"

kherud — Mon, 21 Aug 2023 20:14:44 +0000

Why is this the expected result? The original transformer algorithm has a n^2 computational complexity, where n is the amount of tokens. As far as I know, there are some improvements which bring it down to something like n*log(n). A linear complexity seems surprising however. Is the reason that calculating the attention can be completely parallelized with decent hardware, so the response time stays linear?

New comment by kherud in "Numbers every LLM developer should know"

kherud — Wed, 17 May 2023 19:42:50 +0000

Can somebody please explain how quantization below 8 bit works? Since a byte is the smallest addressable unit I think, is the dimensionality of the weights somehow reduced?