Hacker News: kgeist

New comment by kgeist in "ML promises to be profoundly weird"

kgeist — Thu, 09 Apr 2026 11:25:06 +0000

I agree the original poster exaggerated it. But generally models indeed have stopped growing at around 1-1.5 trillion parameters, at least for the last couple of years.

>Even now, I don't know if parameter count stopped mattering or just matters less

Models in the 20b-100b range are already very capable when it comes to basic knowledge, reasoning etc. Improving the architecture, having better training recipes helped decrease the required parameter count considerably (currently 8b models can easily beat the 175b strong GPT3 from 3 years ago in many domains). What increasing the parameter count currently gives you is better memorization, i.e. better world knowledge without having to consult external knowledge bases, say, using RAG. For example, Qwen3.5 can one-short compilable code, reason etc. but can't remember the exact API calls to to many libraires, while Sonnet 4.6 can. I think what we need is split models into 2 parts: "reasoner" and "knowledge base". I think a reasoner could be pretty static with infrequent updates, and it's the knowledge base part which needs continuous updates (and trillions of parameters). Maybe we could have a system where a reasoner could choose different knowledge bases on demand.

New comment by kgeist in "ML promises to be profoundly weird"

kgeist — Thu, 09 Apr 2026 09:43:00 +0000

>The code being simple doesn't mean much when all the complexity is encoded in billions of learned weights. The forward pass is just the execution mechanism. Conflating its brevity with simplicity of the underlying computation is a basic misunderstanding of what a forward pass actually is. What you've just said is the equivalent of saying blackbox.py is simple because 'python blackbox.py' only took 1 line. It's just silly reasoning.

Look at what a transformer actually does. Attention is a straightforward dictionary look up in like 3 matmuls. A FFN is a simple space transform rule with a non-linear cutoff to adjust the signal (i.e. a few more matmuls and an activation function) before doing a new dictionary lookup in the next transformer block. Add a few tricks like residual connections, output projections, and repeat N times.

So yeah, the actual inference code is 50 lines of code, and the rest is large learned dictionaries to search in, with some transforms. So you're saying my one-liner program that consults a DB with 1 million rows is actually 1 million lines of code? Well, not quite.

This trick, coupled with lots of prelearned templates, is enough to fool people into believing there's "there" there (the OP's post above). Just like ELIZA back in the day. Well, apparently this trick is enough to solve lots of problems, because apparently lots of problems only require search in a known problem (template) space (also with reduced dimensionality). But it's still just a fancy search algorithm. I think the whole thing about "emergent behavior" is that when a human is confronted with a huge prelearned concept space, it's so large they cannot digest what is actually happening, and tend to ascribe magical properties to it like "intelligence" or "consciousness". Like, for example, imagine if there was a huge precreated IF..THEN table for every possible question/answer pair a finite human might ask in their lifetime. It would appear to the human there's intelligence, that there's "there" there. But at the end of the day it would be just a static table with nothing really interesting happening inside of it. A transformer is just a nice trick that allows to compress this huge IF..THEN table into a few hundreds gigabytes.

>So ? I can pick the least likely token every time. The result would be garbage but that doesn't say anything about the model. The popular strategy is to randomly pick from the top n choices. What do you is keeping thousands of tokens coherent and on point even with this strategy ? Why don't you try sampling without a large language model to back it and see how well that goes for you

I was referring to the OP post's:

  there is no "there" there

It doesn't even "know" what the actual text continuation must be, strictly speaking. It just returns a list of probabilities that we must select. It can't select it itself. To go from "list of probabilities" to "chatbot" requires adding additional hardcoded code (no AI involved) that greatly influences how the chatbot behaves, feels. Imagine if an actual sentient being had a button: you press it, and suddenly Steven the sailor becomes a Chinese lady who discusses Confucius. Or starts saying random gibberish. There's no independent agency whatsoever. It's all a bunch of clever tricks.

>What do you think happens when you remove or corrupt arbitrary regions of the human brain? People can lose language, vision, memory, or reasoning, sometimes catastrophically.

In an actual brain, the structure of the connectome itself drives a lot of behavior. In an LLM, all connections are static and predefined. A brain is much more resistant to failure. In an LLM changing a single hypersensitive neuron can lead to a full model collapse. There are humans who live normal lives with a full hemisphere removed.

New comment by kgeist in "Who is Satoshi Nakamoto? My quest to unmask Bitcoin's creator"

kgeist — Wed, 08 Apr 2026 22:38:01 +0000

>the same Napster vs Gnutella analogy, the same celebrity email filtering idea, the same obscure FDR gold ban interest, the same weird hyphenation errors

Dunno it assumes their cypherpunk group must always discuss strictly cryptography and never discuss anything else. It could be just some off-topic ideas floating around in their community.

For me, the only solid, damning evidence would be statistical methods of text analysis like they do to prove authenticity of a literary work.

New comment by kgeist in "ML promises to be profoundly weird"

kgeist — Wed, 08 Apr 2026 22:05:54 +0000

>and while I agree humans can make similar mistakes/confabulations, I overwhelmingly feel that there is no "there" there.

What really opened my eyes a couple weeks ago (anyone can try this): I asked Sonnet to write an inference engine for Qwen3, from scratch, without any dependencies, in pure C. I gave it GGUF specs for parsing (to quickly load existing models) and Qwen3's architecture description. The idea was to see the minimal implementation without all the framework fluff, or abstractions. Sonnet was able to one-shot it and it worked.

And you know what, Qwen3's entire forward pass is just 50 lines of very simple code (mostly vector-matrix multiplications).

The forward pass is only part of the story; you just get a list of token probabilities from the model, that is all. After the pass, you need to choose the sampling strategy: how to choose the next token from the list. And this is where you can easily make the whole model much dumber, more creative, more robotic, make it collapse entirely by just choosing different decoding strategies. So a large part of a model's perceived performance/feel is not even in the neurons, but in some hardcoded manually-written function.

Then I also performed "surgery" on this model by removing/corrupting layers and seeing what happens. If you do this excercise, you can see that it's not intelligence. It's just a text transformation algorithm. Something like "semantic template matcher". It generates output by finding, matching and combining several prelearned semantic templates. A slight perturbation in one neuron can break the "finding part" and it collapases entirely: it can't find the correct template to match and the whole illusion of intelligence breaks. Its corrupted output is what you expect from corrupting a pure text manipulation algorithm, not a truly intelligent system.

New comment by kgeist in "ML promises to be profoundly weird"

kgeist — Wed, 08 Apr 2026 20:57:51 +0000

>Raw parameter counts stopped increasing almost 5 years ago, and modern models rely on sophisticated architectures like mixture-of-experts, multi-head latent attention, hybrid Mamba/Gated linear attention layers, sparse attention for long context lengths, etc.

Agree, I recently updated our office's little AI server to use Qwen 3.5 instead of Qwen 3 and the capability has considerably increased, even though the new model has fewer parameters (32b => 27b)

Yesterday I spent some time investigating it:

- Gated DeltaNet (invented in 2024 I think) in Qwen3.5 saves memory for the KV kache so we can afford larger quants

- larger quants => more accurate

- I updated the inference engine to have TurboQuant's KV rotations (2026) => 8-bit KV cache is more accurate

- smaller KV cache requirements => larger contexts

Before, Qwen3 on this humble infra could not properly function in OpenCode at all (wrong tool calls, generally dumb, small context), now Qwen 3.5 can solve 90% problems I throw at it.

All that thanks to algorithmic/architectural innovations while actually decreasing the parameter count.

New comment by kgeist in "Who is Satoshi Nakamoto? My quest to unmask Bitcoin's creator"

kgeist — Wed, 08 Apr 2026 06:25:39 +0000

>I read up to here, but I wasn't convinced that this is the revelation that the author claims

The rest of the arguments is as weak:

1) both released open-source software

2) both don't like spam

3) both like using pseudonyms online

4) both love freedom

5) both are anti-copyright

etc.

Basically, the author found that Adam Back used the same words on X as Satoshi did in some emails (including such rare words as "dang," "backup," and "abandonware") and then decided to find every possible "link" they could to build the case, even if most of the links are along the lines of "Both are humans! Coincidence? I think not."

New comment by kgeist in "German police name alleged leaders of GandCrab and REvil ransomware groups"

kgeist — Tue, 07 Apr 2026 08:45:19 +0000

Schukin isn't a very common last name (definitely not Ivanov-tier). The first name, the patronymic (his father is Maksim) and the last name all match, as well as the city (the article says he lives in Krasnodar). In fact, this Krasnodar-based entrepreneur is the only person that shows up in the search at all for "Daniil Maksimovich Schukin". Not to say the business was registered right when the ransoms started (2019). Too many coincidences if it's just a namesake.

New comment by kgeist in "Got kicked out of uni and had the cops called for a social media website I made"

kgeist — Mon, 06 Apr 2026 22:16:42 +0000

>just report the post, i would have taken it down

Last time someone asked to take down a post, you said "bitch come suck my dick" according to your own blog.

New comment by kgeist in "German police name alleged leaders of GandCrab and REvil ransomware groups"

kgeist — Mon, 06 Apr 2026 14:36:57 +0000

Found his record in Russia's official company registry. This is what he officially does as an entepreneur:

  56.10 — Restaurant activities and food delivery services

  47.23 — Retail sale of fish, crustaceans, and mollusks in specialized stores

  47.25.12 — Retail sale of beer in specialized stores

  47.25.2 — Retail sale of soft drinks in specialized stores

  47.29.39 — Retail sale of other food products in specialized stores, not included in other groups

  68.20 — Lease and management of own or leased real estate

Money is reinvested into selling beer and fish :) Interestingly, he registered all that in 2019, just when the ransoms started.

New comment by kgeist in "Gemma 4 on iPhone"

kgeist — Sun, 05 Apr 2026 21:51:46 +0000

Qwen3.5 comes in various sizes (including 27B), and judging by the posts on HN, /LocalLlama etc., it seems to be better at logic/reasoning/coding/tool calling compared to Gemma 4, while Gemma 4 is better at creative writing and world knowledge (basically nothing changed from the Qwen3 vs. Gemma3 era)

New comment by kgeist in "[dead]"

kgeist — Sat, 04 Apr 2026 10:27:23 +0000

I think the headline is misleading. It's some random fork of llama.cpp, I can't find evidence that TurboQuant was actually added to llama.cpp proper.

The only legit PR I can find is this [0] and it's still open.

There's currently a lot of rejected vibe-coded PRs: [1] (violation of AI policy).

The OP's PR says it was generated with Claude Code so it has a very low chance of getting merged upstream.

[0] https://github.com/ggml-org/llama.cpp/pull/21089

[1] https://github.com/ggml-org/llama.cpp/pulls?q=Turboquant+is%...

New comment by kgeist in "Qwen3.6-Plus: Towards real world agents"

kgeist — Thu, 02 Apr 2026 15:35:16 +0000

They've always had closed-source variants:

- Qwen3.5-Plus

- Qwen3-Max

- Qwen2.5-Max

etc. Nothing really changed so far.

New comment by kgeist in "Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs"

kgeist — Wed, 01 Apr 2026 08:50:10 +0000

>Second, what's even more crazy is that roughly 98% of that DNA is actually non-coding.. just junk.

I think it's a myth that non-coding DNA is junk. Say:

https://www.nature.com/articles/444130a

>'Non-coding' DNA may organize brain cell connections.

New comment by kgeist in "TinyLoRA – Learning to Reason in 13 Parameters"

kgeist — Wed, 01 Apr 2026 05:34:48 +0000

>One theory is that the knowledge required to solve the task is already stored in the parameters of the model, and only the style has to change for task success

>In particular, learning to generate longer outputs may be possible in few parameters

Reminded me of: https://arxiv.org/abs/2501.19393

>we develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps

Maybe, indeed, the model simply learns to insert the EOS token (or similar) later, and the capability is already in the base model

New comment by kgeist in "Mr. Chatterbox is a Victorian-era ethically trained model"

kgeist — Tue, 31 Mar 2026 05:50:18 +0000

Prior art: https://news.ycombinator.com/item?id=46590280

>TimeCapsuleLLM: LLM trained only on data from 1800-1875

New comment by kgeist in "Copilot edited an ad into my PR"

kgeist — Mon, 30 Mar 2026 17:16:38 +0000

I think ads can be removed with abliteration, just like refusals in "uncensored" versions. Find the "ad vector" across activations and cancel it.

New comment by kgeist in "Show HN: I made a "programming language" looking for feedback"

kgeist — Mon, 30 Mar 2026 01:11:31 +0000

https://en.wiktionary.org/wiki/glupe

Glupe is the plural form, "stupid ones" :)

New comment by kgeist in "Show HN: I made a "programming language" looking for feedback"

kgeist — Sun, 29 Mar 2026 21:13:13 +0000

Glupe means "stupid" in Slavic languages, was it on purpose?

New comment by kgeist in "Folk are getting dangerously attached to AI that always tells them they're right"

kgeist — Sat, 28 Mar 2026 16:10:39 +0000

>We evaluated 11 state-of-the-art AI-based LLMs, including proprietary models such as OpenAI’s GPT-4o

The study explores outdated models, GPT-4o was notoriously sycophantic and GPT-5 was specifically trained to minimize sycophancy, from GPT-5's announcement:

>We’ve made significant advances in reducing hallucinations, improving instruction following, and minimizing sycophancy

And the whole drama in August 2025 when people complained GPT-5 was "colder" and "lacked personality" (= less sycophantic) compared to GPT-4o

It would be interesting to study evolution of sycophantic tendencies (decrease/increase) in models from version to version, i.e. if companies are actually doing anything about it

New comment by kgeist in "Improving Composer through real-time RL"

kgeist — Sat, 28 Mar 2026 01:09:34 +0000

>We used a Kimi base, with midtraining and RL on top. Going forward, we'll include the base used in our blog posts, that was a miss. Also, the license is through Fireworks. [0]

And still no mention of Kimi in a new blog post :)

Also apparently the inference provider they use, Fireworks AI, already has built-in API for RL tuning Kimi [1], so I wonder which parts are Cursor's own effort and where Fireworks AI actually deserves credit, especially since they repeatedly brag about being able to create a new checkpoint every 5 hours, which would be largely thanks to Fireworks AI's API/training infrastructure.

I mean, I'm genuinely curious how much effort it would actually take me to go from "here, lots of user data" to "the model gains +1% on benchmarks" to produce my own finetune, assuming I already use a good existing foundational model, my inference provider already handles all the tuning infrastructure/logic, and I already have a lot of usage logs.

[0] https://news.ycombinator.com/item?id=47459529

[1] https://fireworks.ai/blog/kimi-k2p5