Hacker News: ainch

New comment by ainch in "Delayed Gratification – Proud to Be 'Last to Breaking News'"

ainch — Tue, 28 Jul 2026 16:31:00 +0000

It's heartening to see that Delayed Gratification has successfully continued their work for so long.

I used to work at Tortoise (who now own/operate The Observer) and our motto was 'Slow News'. We operated on a pretty transparent membership model, but I got the impression that it was difficult to make pure membership work financially, leading to a reliance on other revenue streams like podcasting and business partnerships over time.

New comment by ainch in "PyTorch: A Reference Language"

ainch — Tue, 28 Jul 2026 16:07:07 +0000

I think this idea is exactly right. Many ML papers that include pseudocode basically just write out a minimal PyTorch training loop - down to the loss.backward(). It has, along with Python, become something of a linga franca for research in the field.

New comment by ainch in "About the security content of macOS Tahoe 26.6"

ainch — Tue, 28 Jul 2026 11:48:06 +0000

Some data is so sensitive it likely has to stay on premises though.

New comment by ainch in "Kimi K3 is not cheap"

ainch — Sun, 26 Jul 2026 23:45:01 +0000

It is cheaper than GPT-5.6 and Anthropic's models. But then on coding agents specifically (if we take artificial analysis, at least - I'm sure there's a better meta-review you could do) Grok-4.5 scores better and is 20% cheaper. Of course there are other reasons you might prefer Kimi to Grok. But still, if coding is the main concern, is Kimi opening up somewhere new on the cost-capability Pareto frontier? Not necessarily.

New comment by ainch in "Kimi K3 is not cheap"

ainch — Sun, 26 Jul 2026 22:20:15 +0000

No I think you're right that the amount of compute spent on office work is lower than coding - although I don't have any sense for the right share. The best source I could find was an OpenAI report [1] which mentions that ~64% of enterprise token generation is via Codex, which I would expect to skew entirely towards coding. But it's hard to say how the remainder is split, what proportion is 'frontier', or whether it's representative for Anthropic.

On your questions - I've spoken to a number of execs and seniors behind closed doors but nothing public I can point to. Anecdotally, I've spoken to senior leaders at banks spending billions of tokens on one-off tasks like prepping execs for earnings calls or piloting end-to-end agent workflows for specific use cases (but mostly piecemeal/one-off).

On Fable, financial analysts I know are using it to produce research docs, models and decks - I hear that it's a big improvement for these tasks. This lot have been blindsided by the spend growth [2], the same as for coders in enterprise (e.g. Uber blowing annual budget in 4 months [3]), so I do think there's appetite and budget for a capable, cheaper open model - but, due to the price, Kimi does not obviously fill that role the way it might for coding. That said, I still largely agree with you on share - where coding has seen a broad deployment across software development, most of the office work stuff is still fairly piecemeal and certainly lower compute-spend.

I think it's fair to say I could've focussed on coding more rather than taking AA's benchmark distribution as representative - perhaps a more balanced title would be "Kimi K3 is not cheap across the board"? I guess there's also some ambiguity about what 'cheap' means - as I said elsewhere in this thread, I think when some people talk about the price of Chinese models, they imagine Deepseek competing with o1 for 1/20th of the price. Even though it is better priced for coding, Kimi isn't Deepseek-level cheap.

I do, however, think you could debate whether coding will remain at >50% total token usage going forwards - big enterprises are hunting for ways to get value out of LLMs, and the labs are investing a correspondingly large amount in generating demonstrations and RL environments to get the models up to par. At the end of the day, programmers make up ~5% of all white collar work. Of course, it's also possible that Chinese labs will shift focus to white collar applications now they've demonstrated a lead on coding cost efficiency, so, I mean who knows - it'll be interesting to get some detail when Anthropic IPOs.

Sorry for the long reply! Appreciate it's quite meandering...

[1] https://cdn.openai.com/pdf/5d1e1489-21c0-43e4-9d42-f87efdbf0...

[2] https://www.reuters.com/business/finance/australias-cba-flag...

[3] https://fortune.com/2026/05/26/uber-coo-ai-spending-tokens-c...

New comment by ainch in "Kimi K3 is not cheap"

ainch — Sun, 26 Jul 2026 20:41:17 +0000

Agreed, Kimi is cheaper for coding - I say that explicitly in the post too. However I'd have to disagree with you on the "office task" front.

General office work is one of the big frontiers the labs are pushing on, and it's part of how they're justifying the value proposition to enterprise customers. It's also accounts for a big portion of the spend on RL; tasks/environments designed to train agents to navigate Slack or Salesforce. If you're Anthropic pitching Claude to a bank (taking an example I'm familiar with), coding probably accounts for ~20% tops of the workforce, and it doesn't drive direct revenues. The 'agentic coding bump', but for all your analysts, traders, and wealth managers, would be a much more attractive prospect.

I don't disagree that coding is the most successful use case so far (and probably more relevant to a HN audience). But I think the future of the labs is also contingent on them making progress on more general white collar work. I suspect that's why the Opus 5 release blog lists 3 coding benchmarks (FrontierBench, DeepSWE and FrontierCode) to 3 or 4 more general ones applicable to office work - depending on how you slice it (GDPVal, AutomationBench, Legal Agent Benchmark, BrowseComp).

New comment by ainch in "Kimi K3 is not cheap"

ainch — Sun, 26 Jul 2026 20:27:24 +0000

That's a very fair critique.

I don't mean to imply that Kimi is not at all cheaper than U.S frontier models. I more wrote this because I believe - since Chinese LLMs entered the public consciousness via DeepSeek R1, which was genuinely ~20x cheaper than o1 - there's a bit of a halo effect around Chinese models which causes people to overestimate the scale of the discount. And relative to that price anchor, Kimi is less extraordinarily cheap.

At the moment Kimi is ~10% cheaper than GPT-5.6 on the AA benchmark, and as you say that could go down to 20-30% cheaper (although I don't know how inference provider discounts play out on real world usage once you account for quantisation etc...). I'm not trying to suggest that that's nothing, but I do think some of the people driving the Chinese AI discourse would have a harder time pitching their conclusions if they were saying "this new Chinese model is 10% cheaper on some tasks, and it might get another 20% cheaper in the future".

Kimi K3 is not cheap

ainch — Sun, 26 Jul 2026 19:37:09 +0000

Article URL: https://www.alexinch.com/blog/kimi-k3

Comments URL: https://news.ycombinator.com/item?id=49061620

Points: 22

# Comments: 23

New comment by ainch in "Kimi K3 Is Competitive with Fable; Kimi K3 and Fable Is SoTA"

ainch — Wed, 22 Jul 2026 15:45:25 +0000

It's telling that in their release post, Moonshot themselves said that K3 lags Fable and GPT-5.6 in "user experience". I took that to mean the stuff you can't push directly via RL, what some people call "big model smell".

"Despite being a highly competitive model overall, K3 nonetheless exhibits a noticeable gap in user experience compared with Claude Fable 5 and GPT 5.6 Sol."

https://www.kimi.com/blog/kimi-k3

New comment by ainch in "OpenAI says its AI went rogue and launched 'unprecedented' cyber-attack"

ainch — Wed, 22 Jul 2026 14:29:48 +0000

I appreciate people thinking this is a marketing ploy, but at the same time, OpenAI have just had to delay a model release because of government attention on cybersecurity risk. This incident will increase the attention on them specifically.

Even if it is a marketing ploy, I could see this stuff backfiring catastrophically - after all they have just illegally hacked a 3rd party via a model they can't control properly. Any serious person in government (US or otherwise) will look at this and say "these guys have no idea what they're doing"

New comment by ainch in "OpenAI says its AI went rogue and launched 'unprecedented' cyber-attack"

ainch — Wed, 22 Jul 2026 12:59:24 +0000

HuggingFace posted an incident report a week ago, which makes it much more likely that this happened. I understand people are suspicious of OpenAI, but I don't think there's any reason to believe this is a made-up event.

https://huggingface.co/blog/security-incident-july-2026

The Boring Frontier of Robotics

ainch — Wed, 22 Jul 2026 11:27:53 +0000

Article URL: https://www.alexinch.com/blog/boring-frontier

Comments URL: https://news.ycombinator.com/item?id=49005035

Points: 2

# Comments: 0

New comment by ainch in "Less Is More: Why Audio on SoundCloud Looks Different"

ainch — Sun, 19 Jul 2026 19:12:52 +0000

I've been sharing music on SoundCloud for a decade and I am genuinely interested in this topic, but I cannot stand how strongly this post reeks of LLM writing. Do people releasing this kind of content not notice all the clichés?

New comment by ainch in "The Little Book of Reinforcement Learning"

ainch — Fri, 17 Jul 2026 10:32:45 +0000

There is a field of hierarchical RL in which the optimisation occurs over a range of time scales/abstraction. But I'm not aware of much practical success for these approaches so far.

New comment by ainch in "The Little Book of Reinforcement Learning"

ainch — Fri, 17 Jul 2026 10:31:34 +0000

Do you have a good source on this information theory framing? I don't remember it being covered in Sutton & Barto.

New comment by ainch in "Yann LeCun on AMI Labs, JEPA, and the AI World of 2030"

ainch — Thu, 16 Jul 2026 21:28:54 +0000

We're still far from solving real-world physical interaction, even with the world knowledge of LLMs incorporated into VLAs. It's the reason robotics startups trying to deploy into people's homes are so reliant on teleop; the hardware is sufficient (we know that because the robots can solve tasks when piloted by a human), but the intelligence is still insufficient.

I don't think JEPA is necessarily the solution (although I'm bullish on world models), but I don't understand why people feel so infuriated about LeCun. The reason he is famous today is because he spent years taking a contrarian stance, working on neural networks when they were seen as dead and buried. He was eventually proven right. Nowadays he holds strong views which contradict the LLM zeitgeist --- what's shocking about that?

New comment by ainch in "AI 2040: Plan A"

ainch — Sat, 11 Jul 2026 11:01:49 +0000

One of the lead authors, Daniel Kokotajlo, worked at OpenAI for years before quitting. In 2021 he wrote a remarkably accurate forecast of how LLMs would develop over the following 5 years[0].

I think it should be obvious that he understands that LLMs are trained via next-token prediction.

[0]: https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-...

New comment by ainch in "GPT-5.6"

ainch — Fri, 10 Jul 2026 06:22:39 +0000

Yann is a big SSL guy but I don't think he was involved in the original DINO - he's not listed as a co-author or anything.

New comment by ainch in "The classifiers Anthropic puts in front of Fable are too zealous"

ainch — Wed, 08 Jul 2026 23:06:10 +0000

The most basic machine learning-related query gets flagged for me. For example:

  In flax nnx, what's the idiomatic way to store state on a Module. For example, if I'm handling the carry manually for an nnx.RNN.

Or one asking about a checkpointing package:

  How do I restore one of the orbax checkpoints into NNX from this script?

I also got flagged for asking about syntax highlighting in the Helix editor.

It's a shame - I like Fable for writing tasks over ChatGPT and I do believe Anthropic is a more ethical outfit than OpenAI. But with the safeguards (and Fable access expiring in a few days) there's no reason to pay for draconian guardrails and harsh rate limits.

New comment by ainch in "Grok 4.5"

ainch — Wed, 08 Jul 2026 22:41:38 +0000

There have been papers about model collapse, but the underlying assumption is that you constantly train on only the outputs of the previous model. Later research has shown that as long as you retain some "real" data, training on largely synthetic data is ok.

And in the case the previous poster describes, the other model doesn't generate datasets, it generates environments which the next generation interact with to learn from.