Hacker News: kouteiheika

New comment by kouteiheika in "AI is just unauthorised plagiarism at a bigger scale"

kouteiheika — Thu, 21 May 2026 16:48:10 +0000

> What do people imagine can be done about it at this point? Offer a concrete suggestion.

Simple. Free the companies from copyright liability, but after X amount of time they are required to release everything into the commons. The weights, the training scripts and the full training data (appropriately processed so that it can only be used for training and not for people to easily pirate whatever works were used). They'd still get a monopoly on their model for a little bit to recoup their training costs, but in the end would be forced to give back what they took.

New comment by kouteiheika in "LLM Policy for Rust Compiler"

kouteiheika — Fri, 15 May 2026 13:09:16 +0000

> now they're taking purely emotional stances like "AI evil"

But they aren't? Nowhere in the document it says this; in fact, it says the opposite - that they don't want to make a moral judgement.

> It's already bad enough a programming language wants to play politics (doesn't matter what my politics are if I want to code in the c "community")

It also doesn't matter what your politics are in the Rust community. My personal politics don't agree with the majority of prominent Rust contributors either, and that's fine. It doesn't (and hasn't) stopped me from being able to use Rust for over a decade now. Ignore politics and just engage on a purely technical level, and you'll be fine.

New comment by kouteiheika in "Access to frontier AI will soon be limited by economic and security constraints"

kouteiheika — Fri, 15 May 2026 11:19:10 +0000

> I would imagine not single everyone on HN have enough disposable income that allow us to subscribe Claude Max or other similar max plan of other models without thinking.

You don't need the Max plan with other models if you're not going completely crazy. Other providers have much more generous limits than Anthropic.

New comment by kouteiheika in "Access to frontier AI will soon be limited by economic and security constraints"

kouteiheika — Fri, 15 May 2026 11:09:27 +0000

Depends on which exact model we're talking about, and on your salary.

For example, with the $40/month Kimi Code subscription the limits are so generous that you can use it every day all the time for everything (basically just have an agent constantly running doing something) and never run out of tokens/hit the limits.

New comment by kouteiheika in "LLM Policy for Rust Compiler"

kouteiheika — Fri, 15 May 2026 05:24:07 +0000

> Like seriously, what's the point of explicitly allowing this? Imagine the opposite were true, you weren't allowed to do this - what would they do?

Imagine if they just say "LLMs are banned" then there's a lot of ambiguity. So they specifically outlined that generative uses of LLMs are banned, and that non-generative ones are not banned (i.e. "allowed").

I think it's a poor choice of words on their part, but it makes sense (considering what their policy is). It's more of a "we're not disallowing use in these particular scenarios, so you can still use LLMs for these if you want". Remember: it's a big project, and if they don't explicitly state something then people will ask and waste everyone's time.

New comment by kouteiheika in "Linux gaming is faster because Windows APIs are becoming Linux kernel features"

kouteiheika — Wed, 13 May 2026 20:48:40 +0000

NVidia supports their GPUs for a really long time (unlike AMD, which paradoxically drops official support really fast; e.g. see their ROCm support). Anyway, by the time NVidia drops support for their current newer GPUs there's a high chance that NVK[1] will be ready for general use.

[1] -- https://docs.mesa3d.org/drivers/nvk.html

New comment by kouteiheika in "DeepSeek V4 – almost on the frontier"

kouteiheika — Sat, 02 May 2026 15:17:21 +0000

Are you kidding?

The main difference here is not that DeepSeek's model is completely free of censorship (although I'd wager it's less censored), but that it's open-weight. That has two major advantages:

1) If Anthropic/OpenAI/Google bans you - you're screwed, you can't access their model at all, but if DeepSeek bans - you just go to another provider, or host the model yourself.

2) If the model refuses to answer you can uncensor it (and this is getting easier and more automated day-by-day[1]).

[1] -- https://github.com/p-e-w/heretic

New comment by kouteiheika in "Show HN: I benchmarked how good LLMs are at proofreading English"

kouteiheika — Sat, 25 Apr 2026 19:21:16 +0000

It'd be nice if you could add a separate leaderboard for open-weight models on your results page (or add the ability to filter-out proprietary models).

Also, why use an agent for this? This doesn't make much sense to me, considering it's supposed to be "measuring how well models can find and fix errors in human-written text" -- here you're just as much measuring the model's agentic capabilities as you're measuring its ability to correct the text.

I suppose this is somewhat of an interesting benchmark too, but if I were interested in cost-effective proofreading of a ton of text I'd just do it the old fashioned way: split my text into chunks, write a nice prompt telling the model to proofread the given text and return me the result, attach the prompt to each chunk of text to proofread, and let it rip.

New comment by kouteiheika in "Claude Token Counter, now with model comparisons"

kouteiheika — Mon, 20 Apr 2026 08:31:28 +0000

> That's the pre-tokenizer, not the tokenizer.

Yes, it's an extra tokenizer which runs before the learned tokenizer and injects an inductive bias into it.

> That is mostly a performance optimization that lets the memory requirements for the BPE tokenizer be a lot less.

While it does indeed speed up training of the tokenizer, no, it isn't mostly just a performance optimization? It injects a clear cut inductive bias into the tokenizer (split by words, split by punctuation, don't merge words and numbers, etc. -- is that not an inductive bias?), and for some languages (e.g. Asian languages which don't use spaces) the "it's just for performance" argument doesn't make as much sense because there it has no spaces to split on, so the chunks of text are much longer (although it does still split on punctuation, etc.).

Can we not agree that the absolutist position of "Putting an inductive bias in your tokenizer seems just a terrible idea." (as in - any inductive bias) is not actually true, especially since people are actually doing it?

Note, I'm not actually arguing that hand-crafted morphological tokenizers are better. (Which is the straw man many people seem to be replying to.) I'm just arguing that it should be feasible to train your tokenizer in a more morphologically aware way, because BPE doesn't do that.

> The reason everyone went to BPE was because it was so dramatically better than morphology based tokenizers. [..] BPE already learns morphology because it sees the raw bytes.

The reason everyone went to BPE is because of the bitter lesson (and because you don't have to hardcode your whole vocabulary, i.e. no UNK tokens), and not because it's particularly good at learning the morphology of the actual text. It's trivial to show countless examples where it fails to do so.

New comment by kouteiheika in "Claude Token Counter, now with model comparisons"

kouteiheika — Mon, 20 Apr 2026 07:26:19 +0000

> This is almost certainly wrong.

So how would you explain the increase in token usage, considering the fact that conventionally tokenizers are trained to minimize the token usage within a given vocabulary budget?

> Putting an inductive bias in your tokenizer seems just a terrible idea.

You're already effectively doing this by the sheer fact of using a BPE tokenizer, and especially with modern BPE-based LLM tokenizers[1]. I agree trying to bake this manually in a tokenizer is most likely not a good idea, but I could see a world where you could build a better tokenizer training algorithm which would be able to better take the natural morphology of the underlying text into account.

[1] Example from Qwen3.6 tokenizer:

    "pretokenizers": [
      {
        "type": "Split",
        "pattern": {
          "Regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
        },
        "behavior": "Isolated",
        "invert": false
      }
    ]
  },

New comment by kouteiheika in "Claude Token Counter, now with model comparisons"

kouteiheika — Mon, 20 Apr 2026 04:49:59 +0000

I'd guess it's because they don't want people to reverse engineer it.

Note that they're the only provider which doesn't make their tokenizer available offline as a library (i.e. the only provider whose tokenizer is secret).

New comment by kouteiheika in "Claude Token Counter, now with model comparisons"

kouteiheika — Mon, 20 Apr 2026 04:42:21 +0000

> Opus 4.7 tokenizer used 1.46x the number of tokens as Opus 4.6

Interesting. Unfortunately Anthropic doesn't actually share their tokenizer, but my educated guess is that they might have made the tokenizer more semantically aware to make the model perform better. What do I mean by that? Let me give you an example. (This isn't necessarily what they did exactly; just illustrating the idea.)

Let's take the gpt-oss-120b tokenizer as an example. Here's how a few pieces of text tokenize (I use "|" here to separate tokens):

    Kill -> [70074]
    Killed -> [192794]
    kill -> [25752]
    k|illed -> [74, 7905]
    kill -> [15874]
    killed -> [17372]

You have 3 different tokens which encode the same word (Kill, kill, kill) depending on its capitalization and whether there's a space before it or not, you have separate tokens if it's the past tense, etc.

This is not necessarily an ideal way of encoding text, because the model must learn by brute force that these tokens are, indeed, related. Now, imagine if you'd encode these like this:

   |kill
   |kill|ed
   kill|
   kill|ed
   |kill
   |kill|ed

Notice that this makes much more sense now - the model now only has to learn what "" is, what "kill" is, what "" is, and what "ed" (the past tense suffix) is, and it can compose those together. The downside is that it increases the token usage.

So I wouldn't be surprised if this is what they did. Or, my guess number #2, they removed the tokenizer altogether and replaced them with a small trained model (something like the Byte Latent Transformer) and simply "emulate" the token counts.

New comment by kouteiheika in "MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU"

kouteiheika — Wed, 08 Apr 2026 17:27:33 +0000

> Activation would still require gigabytes for a few kb context.

For that you use activation checkpointing, and you can also offload that to the CPU in a smart way to hide the latency. Although, yes, for long context training the activations do dominate the memory usage (and quantizing them degrades things more than just quantizing weights and/or optimizer states).

New comment by kouteiheika in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

kouteiheika — Wed, 08 Apr 2026 17:21:12 +0000

I have no idea; you have to check their docs.

AFAIK what they do is that they calculate a hash of the true thinking trace, save it into a database, and only send those hashes back to you (try to man-in-the-middle Claude Code and you'll see those hashes). So then when you send then back your session's history you include those hashes, they look them up in their database, replace them with the real thinking trace, and hand that off to the LLM to continue generation. (All SOTA LLMs nowadays retain reasoning content from previous turns, including Claude.)

New comment by kouteiheika in "MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU"

kouteiheika — Wed, 08 Apr 2026 17:15:14 +0000

Yes.

Although this wasn't integrated into PyTorch itself (but to torchtune, which is a different thing). If you're writing your own training loop you need to use a third-party kernel, e.g. the Liger kernel mentioned in the article, or Cut Cross Entropy (which is much better than the Liger one, although IIRC it has a numeric bug in one of its kernels making the results very slightly off).

New comment by kouteiheika in "MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU"

kouteiheika — Wed, 08 Apr 2026 14:53:55 +0000

> You can only give it a try, but don't get your hopes high on a large context.

You may or may not know this, but: when training off-the-shelf LLMs (i.e. ones which have a huge vocabulary) what consumes a huge amount of memory usage is calculating the cross-entropy loss (which gets worse the more tokens you stuff in your batch), so always use a fused cross-entropy kernel.

For example, for a Gemma 2 model with 2B parameters at a batch size of 8k this consumes 24GB of VRAM by default (!); you can fuse your cross-entropy loss with @torch.compile and that can cut down this memory usage to something like a few gigabytes, but with a dedicated kernel this becomes a few megabytes.

New comment by kouteiheika in "MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU"

kouteiheika — Wed, 08 Apr 2026 14:37:53 +0000

This isn't really anything new; I've been doing something like this for quite a while, I just haven't bothered writing a paper. (: Probably anyone who would seriously tackle the problem of "how do I train a huge model on a tiny amount of VRAM?" would come up with something similar.

However, most people in the field don't, because the actual practical utility of training huge models on a single GPU is quite low. (e.g they got 341 tok/s for a 14B model on a single 3090 while with my method I was getting ~1k tok/s on a single 4090; that's still very slow)

Also, there are more tricks one can use to speed up training/lower VRAM usage which they're not using. For example, you don't need any gradient offloading (you can just accumulate the gradients directly into the optimizers' states if you modify your optimizer), you can use Muon instead of Adam (which needs only half of VRAM of Adam), you can use quantization (both for parameters and for the optimizer states; e.g. I found Muon quantized into 4-bit working relatively well), etc.

New comment by kouteiheika in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

kouteiheika — Tue, 07 Apr 2026 14:50:35 +0000

> Can I just see the actual thinking (not summarized) so that I can see the actual thinking without a latency cost?

You can't, and Anthropic will never allow it since it allows others to more easily distill Claude (i.e. "distillation attacks"[1] in Anthropic-speak, even though Athropic is doing essentially exactly the same thing[2]; rules for thee but not for me).

[1] -- https://www.anthropic.com/news/detecting-and-preventing-dist...

[2] -- https://www.npr.org/2025/09/05/g-s1-87367/anthropic-authors-...

New comment by kouteiheika in "German implementation of eIDAS will require an Apple/Google account to function"

kouteiheika — Sun, 05 Apr 2026 13:57:10 +0000

You can, but you lose access to anything that was associated with your old account.

Another fun thing Google did is to automatically (without my consent) add a required second-factor authentication to my current Google account. I have this old, e-waste tier phone that I use mostly only as a glorified alarm clock, and at one point I used it to log into my current Google account.

Imagine my surprise when I tried to log in to my Google account from somewhere else, and it asked me for an authentication code from this phone. Again, I have never explicitly set it up as such - Google did this automatically! So if I were to lose this phone I'd be screwed yet again, with yet another inaccessible Google account that I will have no way of recovering.

At this point I don't depend on any Big Tech services; my Google account has nothing of value associated with it (only my YouTube subscription list, which is easy enough to backup and restore), and I pay for my own email on my own domain, etc. So if I get screwed over yet again by a big, soulless corporation that just sees me as a number on their bottom-line, well, I just won't care.

New comment by kouteiheika in "German implementation of eIDAS will require an Apple/Google account to function"

kouteiheika — Sun, 05 Apr 2026 10:41:37 +0000

This is not a hypothetical problem and you don't need to be deliberately targeted. It actually happens to normal people. And if it does you have absolutely zero recourse.

Source: I have a banned Google account (it's over 20 years old at this point). I know the password, but Google doesn't let me log into it. Every few years I try to unsuccessfully recover it.

If you have a Google account and having it banned would be a problem for you here's my advice: migrate. Right now. You never know when one of their bots will deem you a persona non grata.