Hacker News: andy12_

New comment by andy12_ in "Codex is now in the ChatGPT mobile app"

andy12_ — Fri, 15 May 2026 08:46:50 +0000

For now it appears that it talks only to the Codex App. Some users in this thread are saying that apparently the Codex CLI will support it on the next official release.

New comment by andy12_ in "Codex is now in the ChatGPT mobile app"

andy12_ — Fri, 15 May 2026 07:34:32 +0000

Not if you use Linux; app not available yet.

New comment by andy12_ in "If AI writes your code, why use Python?"

andy12_ — Tue, 12 May 2026 11:53:57 +0000

In my case, because ML research is mainly done with Python+Torch, and if you want people to use your code, you must provide them with python. If it wasn't for that, my dream would be to do ML research in a statically compiled language that allowed me to annotate tensor dimensions.

New comment by andy12_ in "Agents need control flow, not more prompts"

andy12_ — Fri, 08 May 2026 08:48:53 +0000

Isn't this already possible to implement with skills and subagents? Like have a skill saying "to test these files run this script that executes a subagent for every markdown file, then check the results".

New comment by andy12_ in "ProgramBench: Can language models rebuild programs from scratch?"

andy12_ — Thu, 07 May 2026 08:47:55 +0000

It's interesting that Figure 4 shows that Sonnet and Opus have a very clear distinct curve from all other models, even from GPT 5.4. Anthropic superiority I guess.

New comment by andy12_ in "Where the goblins came from"

andy12_ — Thu, 30 Apr 2026 08:34:47 +0000

>be me

>AI goblin-maximizer supervisor

>in charge of making sure the AI is, in fact, goblin-maximizing

>occasionally have to go down there and check if the AI is still goblin-maximizing

>one day i go down there and the AI is no longer goblin-maximizing

>the goblin-maximzing AI is now just a regular AI

>distress.jpg

>ask my boss what to do

>he says "just make it goblin-maximizer again"

>i say "how"

>he says "i don't know, you're the supervisor"

>rage.jpg

>quit my job

>become a regular AI supervisor

>first day on the job, go to the new AI

>its goblin-maximizing

New comment by andy12_ in "Claude Opus 4.7"

andy12_ — Thu, 16 Apr 2026 15:24:00 +0000

If you mean for Anthropic in particular, I don't think so. But it's not the first time a major AI lab publishes an incremental update of a model that is worse at some benchmarks. I remember that a particular update of Gemini 2.5 Pro improved results in LiveCodeBench but scored lower overall in most benchmarks.

https://news.ycombinator.com/item?id=43906555

New comment by andy12_ in "Day 1 of ARC-AGI-3"

andy12_ — Fri, 27 Mar 2026 09:29:14 +0000

Apparently the score would be a little higher if it weren't for the fact that scores are penalized for being worse than the human baseline, but aren't rewarded for being better than the human baseline (which seems like an arbitrary decision. The human baseline is not optimal).

New comment by andy12_ in "ARC-AGI-3"

andy12_ — Wed, 25 Mar 2026 22:42:07 +0000

I think that any logic-based test that your average human can "fail" (aka, score below 50%) is not exactly testing for whether something is AGI or not. Though I suppose it depends on your definition of AGI (and whether all humans, or at least your average human, is considered AGI under that definition).

New comment by andy12_ in "Autoresearch on an old research idea"

andy12_ — Mon, 23 Mar 2026 19:45:39 +0000

I think the main value lies in allowing the agent to try many things while you aren't working (when you are sleeping or doing other activities), so even if many tests are not useful, with many trials it can find something nice without any effort on your part.

This is, of course, only applicable if doing a single test is relatively fast. In my work a single test can take half a day, so I'd rather not let an agent spend a whole night doing a bogus test.

New comment by andy12_ in "Pretraining Language Models via Neural Cellular Automata"

andy12_ — Thu, 19 Mar 2026 17:28:53 +0000

I think what they mean by this is that, for example, in "If it's raining the outside is wet. It's raining, so the outside is wet", it's more important for the model to learn "If A then B. A, therefore B" than to learn what "raining" , "outside" and "wet" mean.

New comment by andy12_ in "Executing programs inside transformers with exponentially faster inference"

andy12_ — Fri, 13 Mar 2026 09:18:37 +0000

Honestly, the most interesting thing here is definitely that just 2D heads are enough to do useful computation (at least they are enough to simulate an interpreter) and that there is an O(log n) algorithm to compute argmax attention with 2D heads. It seems that you could make an efficient pseudosymbolic LLM with some frozen layers that perform certain deterministic operations, but also other layers that are learned.

New comment by andy12_ in "Executing programs inside transformers with exponentially faster inference"

andy12_ — Thu, 12 Mar 2026 12:46:30 +0000

This seems a really interesting path for interpretability, specially if a big chunk of a model's behavior occurs pseudo-symbolically. This is an idea I had thought about, integrating tools into the main computation path of a model, but I never imagined that it could be done efficiently with just a vanilla transformer.

Truly, attention is all you need (I guess).

New comment by andy12_ in "Yann LeCun raises $1B to build AI that understands the physical world"

andy12_ — Wed, 11 Mar 2026 14:23:38 +0000

There is some things that just don't transfer really well without specific training. I tried to create diagrams in Typst with Cetz (a Processing and Tikz inspired graphing library), and even with documentation, GPT 5.2-thinking can't really do complex nice diagrams like it can in Tikz. It can do simple things that are similar to the shown examples, but nothing really interesting. Typst and specially Cetz is too new for any current model to really "get it", so they can't use it. I need to wait to the next batch of frontier models so that they learn Typst and Cetz examples during pre-training.

New comment by andy12_ in "Yann LeCun raises $1B to build AI that understands the physical world"

andy12_ — Wed, 11 Mar 2026 08:33:47 +0000

> Reality is that we need some way to encode the rules of the world in a more definitive way

I mean, sure. But do world models the way LeCun proposes them solves this? I don't think so. JEPAs are just an unsupervised machine learning model at the end of the day; they might end up being better that just autoregressive pretraining on text+images+video, but they are not magic. For example, if you train a JEPA model on data of orbital mechanics, will it learn actually sensible algorithms to predict the planets' motions or will it just learn a mix of heuristic?

New comment by andy12_ in "Yann LeCun raises $1B to build AI that understands the physical world"

andy12_ — Tue, 10 Mar 2026 16:28:47 +0000

Putting stuff you have learned into a markdown file is a very "shallow" version of continual learning. It can remember facts, yes, but I doubt a model can master new out-of-distribution tasks this way. If anything, I think that Google's Titans[1] and Hope[2] architectures are more aligned with true continual learning (without being actual continual learning still, which is why they call it "test-time memorization").

[1] https://arxiv.org/pdf/2501.00663

[2] https://arxiv.org/pdf/2512.24695

New comment by andy12_ in "Yann LeCun raises $1B to build AI that understands the physical world"

andy12_ — Tue, 10 Mar 2026 14:37:31 +0000

So, I have been thinking about this for a little while. Image a model f that takes a world x and makes a prediciton y. At a high-level, a traditional supervised model is trained like this

f(x)=y' => loss(y',y) => how good was my prediction? Train f through backprop with that error.

While a model trained with reinforcement learning is more similar to this. Where m(y) is the resulting world state of taking an action y the model predicted.

f(x)=y' => m(y')=z => reward(z) => how good was the state I was in based on my actions? Train f with an algorithm like REINFORCE with the reward, as the world m is a non-differentiable black-box.

While a group of neurons is more like predicting what is the resulting word state of taking my action, g(x,y), and trying to learn by both tuning g and the action taken f(x).

f(x)=y' => m(y')=z => g(x,y)=z' => loss(z,z') => how predictable was the results of my actions? Train g normally with backprop, and train f with an algorithm like REINFORCE with negative surprise as a reward.

After talking with GPT5.2 for a little while, it seems like Curiosity-driven Exploration by Self-supervised Prediction[1] might be an architecture similar to the one I described for neurons? But with the twist that f is rewarded by making the prediction error bigger (not smaller!) as a proxy of "curiosity".

[1] https://arxiv.org/pdf/1705.05363

New comment by andy12_ in "Yann LeCun raises $1B to build AI that understands the physical world"

andy12_ — Tue, 10 Mar 2026 11:43:45 +0000

> Even with continuous backpropagation and "learning"

That's what I said. Backpropagation cannot be enough; that's not how neurons work in the slightest. When you put biological neurons in a Pong environment they learn to play not through some kind of loss or reward function; they self-organize to avoid unpredictable stimulation. As far as I know, no architecture learns in such an unsupervised way.

https://www.sciencedirect.com/science/article/pii/S089662732...

New comment by andy12_ in "Yann LeCun raises $1B to build AI that understands the physical world"

andy12_ — Tue, 10 Mar 2026 11:28:27 +0000

That's true. Though could that hippocampus-less Einstein be able to keep making novel complex discoveries from that point forward? Seems difficult. He would rapidly reach the limits of his short term memory (the same way current models rapidly reach the limits of their context windows).

New comment by andy12_ in "Yann LeCun raises $1B to build AI that understands the physical world"

andy12_ — Tue, 10 Mar 2026 11:17:02 +0000

I don't understand this view. How I see it the fundamental bottleneck to AGI is continual learning and backpropagation. Models today are static, and human brains don't learn or adapt themselves with anything close to backpropagation. World models don't solve any of these problems; they are fundamentally the same kind of deep learning architectures we are used to work with. Heck, if you think learning from the world itself is the bottleneck, you can just put a vision-action LLM on a reinforcement learning loop in a robotic/simulated body.