<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: adroniser</title><link>https://news.ycombinator.com/user?id=adroniser</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 30 Apr 2026 10:33:42 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=adroniser" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by adroniser in "GPT-5.4 Pro solves Erdős Problem #1196"]]></title><description><![CDATA[
<p>what are you even yapping about.</p>
]]></description><pubDate>Sat, 18 Apr 2026 22:25:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=47820050</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=47820050</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47820050</guid></item><item><title><![CDATA[New comment by adroniser in "Starting from scratch: Training a 30M Topological Transformer"]]></title><description><![CDATA[
<p>Adding the position vector is basic sure, but it's naive to think the model doesn't develop its own positional system bootstrapping on top of the barebones one.</p>
]]></description><pubDate>Sun, 18 Jan 2026 14:16:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=46667965</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=46667965</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46667965</guid></item><item><title><![CDATA[New comment by adroniser in "AI model trapped in a Raspberry Pi"]]></title><description><![CDATA[
<p>fmri's are correlational nonsense (see Brainwashed, for example) and so are any "model introspection" tools.</p>
]]></description><pubDate>Sat, 27 Sep 2025 18:08:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=45398079</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=45398079</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45398079</guid></item><item><title><![CDATA[New comment by adroniser in "Language models pack billions of concepts into 12k dimensions"]]></title><description><![CDATA[
<p>Of the papers submitted to a conference, it might be that reviewers don't offer suggestions that would significantly improve the quality of the work. Indeed the quality of reviews has gone down significantly in recent years. But if Anthropic were going to submit this work to peer review, they would be forced to tighten it up significantly.<p>The linear probe paper is still written in a format where it could reasonably be submitted, and indeed it was submitted to an ICLR workshop.</p>
]]></description><pubDate>Mon, 15 Sep 2025 17:30:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=45252524</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=45252524</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45252524</guid></item><item><title><![CDATA[New comment by adroniser in "Language models pack billions of concepts into 12k dimensions"]]></title><description><![CDATA[
<p>So you think that this blog post would make it into any of the mainstream conferences? I doubt it.</p>
]]></description><pubDate>Mon, 15 Sep 2025 11:03:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=45248298</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=45248298</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45248298</guid></item><item><title><![CDATA[New comment by adroniser in "Language models pack billions of concepts into 12k dimensions"]]></title><description><![CDATA[
<p>peer review would encourage less hand wavy language and more precise claims. They would penalize the authors for bringing up bizarre analogies to physics concepts for seemingly no reason. They would criticize the fact that they spend the whole post talking about features without a concrete definition of a feature.<p>The sloppiness of the circuits thread blog posts has been very damaging to the health of the field, in my opinion. People first learn about mech interp from these blog posts, and then they adopt a similarly sloppy style in discussion.<p>Frankly, the whole field currently is just a big circle jerk, and it's hard not to think these blog posts are responsible for that.<p>I mean do you actually think this kind of slop would be publishable in NeurIPS if they submitted the blog post as it is?</p>
]]></description><pubDate>Mon, 15 Sep 2025 09:57:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=45247960</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=45247960</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45247960</guid></item><item><title><![CDATA[New comment by adroniser in "Canaries in the Coal Mine? Recent Employment Effects of AI [pdf]"]]></title><description><![CDATA[
<p>didn't hackers used to be for piracy?</p>
]]></description><pubDate>Thu, 28 Aug 2025 11:15:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=45050779</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=45050779</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45050779</guid></item><item><title><![CDATA[New comment by adroniser in "Canaries in the Coal Mine? Recent Employment Effects of AI [pdf]"]]></title><description><![CDATA[
<p>This suggests people should pre-register benchmarks. Because currently it feels like there is little incentive to publish benchmarks that models saturate.</p>
]]></description><pubDate>Thu, 28 Aug 2025 11:03:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=45050698</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=45050698</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45050698</guid></item><item><title><![CDATA[New comment by adroniser in "Learning to Reason with LLMs"]]></title><description><![CDATA[
<p>But there are lots of models available now that render much faster which are better quality than sora</p>
]]></description><pubDate>Thu, 12 Sep 2024 22:57:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=41526512</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41526512</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41526512</guid></item><item><title><![CDATA[New comment by adroniser in "The AI Scientist: Towards Automated Open-Ended Scientific Discovery"]]></title><description><![CDATA[
<p>I completely agree this shit is so depressing. When I saw the AlphaProof paper I basically spent 3 days in mourning basically, because their approach was so simple.</p>
]]></description><pubDate>Tue, 13 Aug 2024 23:40:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=41241008</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41241008</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41241008</guid></item><item><title><![CDATA[New comment by adroniser in "The AI Scientist: Towards Automated Open-Ended Scientific Discovery"]]></title><description><![CDATA[
<p>I think the whole paper is a satire lol.</p>
]]></description><pubDate>Tue, 13 Aug 2024 23:36:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=41240965</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41240965</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41240965</guid></item><item><title><![CDATA[New comment by adroniser in "The AI Scientist: Towards Automated Open-Ended Scientific Discovery"]]></title><description><![CDATA[
<p>Does it really? If you want an LLM to edit code you need to feed it every single line of code in a prompt. Is it really that surprising that having just learnt it has been timed out, and then seeing code that has an explicit timeout in it, it edits it?? This is just a claim about the underlying foundational LLM since the whole science thing is just a wrapper.<p>I think this bit of it is just a gimmick put in for hype purposes.</p>
]]></description><pubDate>Tue, 13 Aug 2024 23:30:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=41240924</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41240924</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41240924</guid></item><item><title><![CDATA[New comment by adroniser in "RLHF is just barely RL"]]></title><description><![CDATA[
<p>I agree with you that transformers are probably not the architecture of choice. Not sure what that has to do with the viability of RL though.</p>
]]></description><pubDate>Sun, 11 Aug 2024 13:24:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=41215958</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41215958</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41215958</guid></item><item><title><![CDATA[New comment by adroniser in "RLHF is just barely RL"]]></title><description><![CDATA[
<p>Hmm well the reason a pre-trained transformer is a fancy sentence completion engine is because that is what it is trained on, cross entropy loss on next token prediction. As I say, if you train an LLM to do math proofs, it learns to solve 4 out of the 6 IMO problems. I feel like you're not appreciating how impressive that is. And that is only possible because of the RL aspect of the system.<p>To be clear, i'm not claiming that you take an LLM and do some RL on it and suddenly it can do particular tasks. I'm saying that if you train it from scratch using RL it will be able to do certain well defined formal tasks.<p>Idk what you mean about the online learning ability tbh. The paper uses it in the exact way you specify, which is that it uses RL to play montezuma's revenge and gets better on the fly.<p>Similar to my point about the inference time RL ability of the alphaProof LLM. That's why I emphasized that RL is done at inference time, like each proof you do it uses to make itself better for next time.<p>I think you are taking LLM to mean GPT style models, and I am taking LLM to mean transformers which output text, and they can be trained to do any variety of things.</p>
]]></description><pubDate>Sun, 11 Aug 2024 12:35:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=41215729</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41215729</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41215729</guid></item><item><title><![CDATA[New comment by adroniser in "RLHF is just barely RL"]]></title><description><![CDATA[
<p>But RL algorithms do implement things like curiosity to drive exploration?? <a href="https://arxiv.org/pdf/1810.12894" rel="nofollow">https://arxiv.org/pdf/1810.12894</a>.<p>Thinking to arbitrary depth sounds like Monte Carlo tree search? Which is often implemented in conjunction with RL. And working memory I think is a matter of the architecture you use in conjunction with RL, agree that transformers aren't very helpful for this.<p>I think what you call 'trial and error', is what I intuitively think of RL as doing.<p>AlphaProof runs an RL algorithm during training, AND at inference time. When given an olympiad problem, it generates many variations on that problem, tries to solve them, and then uses RL to effectively finetune itself on the particular problem currently being solved. Note again that this process is done at inference time, not just training.<p>And AlphaProof uses an LLM to generate the Lean proofs, and uses RL to train this LLM. So it kinda strikes me as a type error to say that DeepMind have somehow abandoned RL in favour of LLMs? Note this Demis tweet <a href="https://x.com/demishassabis/status/1816596568398545149" rel="nofollow">https://x.com/demishassabis/status/1816596568398545149</a> where it seems like he is saying that they are going to combine some of this RL stuff with the main gemini models.</p>
]]></description><pubDate>Sun, 11 Aug 2024 09:56:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=41215107</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41215107</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41215107</guid></item><item><title><![CDATA[New comment by adroniser in "RLHF is just barely RL"]]></title><description><![CDATA[
<p>Isn't RL the algorithm we want basically?</p>
]]></description><pubDate>Sat, 10 Aug 2024 12:03:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=41208885</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41208885</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41208885</guid></item><item><title><![CDATA[New comment by adroniser in "RLHF is just barely RL"]]></title><description><![CDATA[
<p>The distinction is that LLMs are not used for what they are trained for in this case. In the vast majority of cases someone using an LLM is not interested in what some mixture of openai employees ratings + average person would say about a topic, they are interested in the correct answer.<p>When I ask chatgpt for code I don't want them to imitate humans, I want them to be better than humans. My reward function should then be code that actually works, not code that is similar to humans.</p>
]]></description><pubDate>Sat, 10 Aug 2024 11:58:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=41208861</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41208861</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41208861</guid></item><item><title><![CDATA[New comment by adroniser in "RLHF is just barely RL"]]></title><description><![CDATA[
<p>How about you want to solve sudoku say.And you simply specify that you want the output to have unique numbers in each row, unique numbers in each column, and no unique number in any 3x3 grid.<p>I feel like this is a very different type of programming, even if in some cases it would wind up being the same thing.</p>
]]></description><pubDate>Sat, 10 Aug 2024 11:51:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=41208833</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41208833</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41208833</guid></item><item><title><![CDATA[New comment by adroniser in "AI solves International Math Olympiad problems at silver medal level"]]></title><description><![CDATA[
<p>"AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go."</p>
]]></description><pubDate>Fri, 26 Jul 2024 12:24:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=41078017</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=41078017</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41078017</guid></item><item><title><![CDATA[New comment by adroniser in "Reasoning in Large Language Models: A Geometric Perspective"]]></title><description><![CDATA[
<p>I did read your entire comment, and that is what prompted my response, because from my perspective your entire premise was based on LLMs failing at simple examples, and yet despite admitting you thought there was a chance an LLM would succeed at your example, it didn't seem you'd bothered to check.<p>The argument you are making is based on the fact that the example is simple. If the example were not simple, you would not be able to use it to dismiss LLMs.<p>I am not surprised that GPT 3.5 and 4o failed, they are both terrible models. GPT4-o is multimodal, but it is far buggier than gpt-4. I tried with claude 3.5 sonnet and it got it first try. It also was able to compute the moves when told the rule change.</p>
]]></description><pubDate>Mon, 08 Jul 2024 17:20:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=40907301</link><dc:creator>adroniser</dc:creator><comments>https://news.ycombinator.com/item?id=40907301</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40907301</guid></item></channel></rss>