<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: stalfie</title><link>https://news.ycombinator.com/user?id=stalfie</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 30 May 2026 21:21:28 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=stalfie" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by stalfie in "Snowboard Kids 2 is 100% Decompiled"]]></title><description><![CDATA[
<p>The counterpoint is that every company ever has based themselves on human effort they never paid for (usually). The entire scientific endeavour for example. Standing on the shoulders of giants and so on.</p>
]]></description><pubDate>Sat, 30 May 2026 09:09:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=48334228</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=48334228</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48334228</guid></item><item><title><![CDATA[New comment by stalfie in "Sam Altman Won in Court Against Elon Musk. But, We All Lost"]]></title><description><![CDATA[
<p>Ok, fair point about the lizardbrain jealousy, the choice of wording there was needlessly antagonistic. In my defense, even though that point might seem reductive, I don't mean it to be. I'd say the only reason words like"fair", even exists is because of that basic emotional response. We dress it up in higher order concepts, but I genuinely think everything about fair distribution boils down to human emotional responses that in different contexts are given names such as "greed" or "jealousy". And the tribal environment IS the environment our brains are adapted to. I don't think it's necessarily reductive to point out the very foundation of where the impulses from our cognitive medium is coming from. It's the basis of pretty much everything about human society. From a sibling being pissed because he got one less slice of cake, to intellectual property law. It all boils down to the same emotional programming.<p>And I just want to be absolutely clear on one thing, I am in no way shape or form mocking science. I am part of academia, and I see science as the one most valuable thing humanity is doing. And I am not convinced at all that the capitalist way things are organized are the best overall, but this is mostly because it incentivises locking down knowledge behind intellectual property. Personally, I am for radical openness of knowledge, with no pay walls, and that the reward structures for producing knowledge should be separate from how that knowledge is being used. Even the publication system is too greed-based in my opinion, if it was up to me, every lab would live stream their work and publish every thought and idea as it happens (with the possible exception of biosecurity adjacent stuff), and 10-20% of taxes going to science would be international law.<p>This issue is not what I was discussing, what I perceived us to be discussing was whether or not Jonas Salk should have been pissed at the pharmaceutical companies profiting of his vaccine. As it happened, Jonas Salk was charitable, and believed in the openness of science, and understood the fact that someone actually had to produce and distribute the polio vaccine, and that this costs money. And as it happened, society had given this mandate to private companies, that operate with profit margins. And if him insisting on having a cut, if doing so would result in even a single additional person dying that wouldn't have otherwise, which it probably would have, then he didn't insist. Also he was too busy doing science, he even commented later that his fame was partially an unwelcome distraction from that.<p>To me, this is the ideal of a scientist. Someone who knows they stand on the shoulders of giants, and doesn't fret about the fact that the people who are standing on his are closer to the sun. The difference between Jonas Salk and everyone complaining about not being rewarded for training tokens, is that Jonas Salk made the choice to not patent his invention, and here the labs made the choice for everyone else. Jonas Salk and many others are charitable, but not everyone is, hence the complaining. But if everyone is forced to be Salk against their will, is that really so bad?<p>I see the LLM labs as the pharmaceutical companies, occupying a societal mandate to actually produce, and all the perils that come with it. But "giving back" is not part of that mandate, and unfortunate as that might be, that is not their fault. They are tasked with production, competition and progress, and that is already so expensive that they are struggling to meet demand.<p>And, you know, if redistribution truly is as easy as you say, at this thought my brain also produces a tinkle of anger at the injustice of it. And if I look for somewhere to direct that anger, I even have a name and a face! Look at Sam Altman, that smug supercar driving bastard, profiting of the hard work of Stackoverflow commenters everywhere. Eat him! Like, not in a gay way, but like in eat the rich! Out with the guillotines!<p>To me, that's my lizardbrain talking. The reality of our more complex non-tribal society is that the corporate structures we have created were not tasked with distributing rewards fairly, they were tasked with competing no matter the cost. And so successful companies do that, because the ones that don't disappear.<p>And like it or not, this methodology seems to work, on the whole. Even though it also offends my scientist sensibilities, it turns out that humans are greedy, and so incorporating that impulse into a structure that is limited to soft power is a good idea (unlike classic communism, where the same powers that produced products could also kill you). And it's not Sams fault that this is how it is. "Just running a business" is the reward structure society has created, and it's not part of Sam's job description to break that mold and start rewarding people he doesn't have to. In fact it might even be illegal for him to do so if it doesn't reward investors somehow also (like PR-wise).<p>And the overall fact is that the labs are doing what they need to do in order to produce something entirely new, that no one was even sure would be useful before they created it. And they are the representatives of the capitalist way of doing things, and if a publicly funded LLM undercuts them, that is fine. But maybe you do actually need a trillion dollars to make useful LLMs. Overall it's a good thing that someone is at least trying with funds, because there was no one lining up to buy 200k H100 for even the most prestigious of publicly funded academic institution, and certainly not for sending a check to all authors on arXiv.<p>And so overall, capitalism seems to me to be doing a fine job, and I pay my subscription fee gladly. And when my lizardbrain provides it's hateful opinion, I think of Jonas Salk, and it doesn't seem so bad after all.</p>
]]></description><pubDate>Sat, 23 May 2026 11:49:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=48246886</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=48246886</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48246886</guid></item><item><title><![CDATA[New comment by stalfie in "Sam Altman Won in Court Against Elon Musk. But, We All Lost"]]></title><description><![CDATA[
<p>Well, is it true that they give back nothing? What about the compute? Pay checks? The value of the subscription you pay for? What about actual examples of things they have given back for free, like Whisper, which used to be SOTA and is still extremely useful. Occasional excellent research papers, particularly from anthropic?<p>My point about "moral panic" is that it leads to statements like "giving back nothing"... which are objectively untrue. They might not reward every person that contributed to the tokens they are training on, but doing so is extremely practically difficult, and hard to do fairly, and is also probably a waste of limited resources in terms of net human progress. All these companies are doing is exactly what society has set up as the capitalist methodology to get work done: gather investors, pay people, sell products, etc... As opposed for example to the communist party deciding that the state should fund your project, or to fight for a research grant, or some other methodology, which might or might not work as well.<p>The only curious part about capitalism is that some individuals get a disproportionate amount of the reward for work done. At a societal level, this is essentially a soft power redistribution system, but often also leads to obnoxious individuals with supercar collections. Whether this is an overall good or bad thing for human progress is really, really hard to say for sure. However, it has a tendency to promote a lizardbrain response evolved to promote resource sharing in tribal societies, which was the best overall strategy in that setting. Or in other words, it makes people jealous.</p>
]]></description><pubDate>Fri, 22 May 2026 20:44:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48241385</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=48241385</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48241385</guid></item><item><title><![CDATA[New comment by stalfie in "Sam Altman Won in Court Against Elon Musk. But, We All Lost"]]></title><description><![CDATA[
<p>I'd say both are equally soulless, dualism is a little bit of a philosophical dead end.<p>Frankly, I don't particularly care much for the moral panic around capitalism. Capitalism has it's downsides for sure, but it's the system our society has chosen to motivate people, and it seems to work okay for many things. Does it matter if the AI model that solves your diagnosis, creates a life saving drug or solves an Erdös problem is made by a corporation or not? It bothers me none, progress is progress. As long as the authors of Textbooks everywhere wouldn't have otherwise invented LLMs a decade ago if they only had been given a little bit more money, then I'd say the money is going to the right place.</p>
]]></description><pubDate>Fri, 22 May 2026 17:05:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48238571</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=48238571</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48238571</guid></item><item><title><![CDATA[New comment by stalfie in "Sam Altman Won in Court Against Elon Musk. But, We All Lost"]]></title><description><![CDATA[
<p>That sentence could easily be applied to the human baseline.</p>
]]></description><pubDate>Fri, 22 May 2026 15:27:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=48237229</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=48237229</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48237229</guid></item><item><title><![CDATA[New comment by stalfie in "Gen Z Resentment Toward AI Grows as Adoption Stagnates and Workplace Fears Mount"]]></title><description><![CDATA[
<p>For the past year, I think I've  watched more AI generated video content than movies in terms of hours spent. Some of it is quite good (eg. Neuralwiz)! Granted, I watch very few movies, but still, I'd say this kind of counts.</p>
]]></description><pubDate>Sun, 10 May 2026 11:24:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=48082992</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=48082992</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48082992</guid></item><item><title><![CDATA[New comment by stalfie in "AlphaEvolve: Gemini-powered coding agent scaling impact across fields"]]></title><description><![CDATA[
<p>Well, if the evaluation infrastructure is something humans could have had access to before, and that the agents key "skill" is just that it's a more patient and scalable worker, I would still argue that this "comes from the agent".<p>Humans get bored, inpatient, or run out of time, and so often give up in what they perceive to be a decent "local minima". Early verification harnesses using gpt-4 for optimizing robot reward functions succeeded quite well on the fact that the LLM just kept going (link below). As long as it is too boring for a human to use the same evaluation infrastructure, this is still an agent skill.<p><a href="https://arxiv.org/abs/2310.12931" rel="nofollow">https://arxiv.org/abs/2310.12931</a></p>
]]></description><pubDate>Thu, 07 May 2026 15:30:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=48050628</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=48050628</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48050628</guid></item><item><title><![CDATA[New comment by stalfie in "This year’s insane timeline of hacks"]]></title><description><![CDATA[
<p>Actually you're right, upon reflection the medical records example is a terrible one, given the proclivities of many governments and/or vindictive mobs. Although the greater issue here is that there exists governments that care about abortions, and the fact people accept living under their reign one way or another. Unfortunately those government are often in positions of power to figure this out and punish individuals no matter what.<p>And I'd just like to underline the fact that this is truly a devil's advocate position, not something I'd argue strongly for.<p>But for the LLM training data company, does that leak matter? I guess that depends on your stance about AI proliferation and safety. But if you don't it's at worst a boost for open source LLMs. Rockstar? A great deal of hard work has surely gone into GTA-6 between all the union busting but, but it hardly matters for humanity what particular game people use to entertain themselves. And the medical device company, although the wipe part is truly just senseless destruction, actually might benefit humanity more if a few bootleg factories of their products appear.<p>Many of these are very stretched scenarios. But for instance in the case of espionage, the problem is not the fact that people are spying, the problem is that there is a war. And the more nefarious regimes tend to depend more on secrecy and lies in order to perpetuate themselves. If total transparency was applied to all governments equally, most democracies would be positively affected. The problem is not the leakage of the Epstein files. It's that this kind of activity could occur in secret and remained covered up.</p>
]]></description><pubDate>Mon, 13 Apr 2026 17:49:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47755562</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47755562</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47755562</guid></item><item><title><![CDATA[New comment by stalfie in "This year’s insane timeline of hacks"]]></title><description><![CDATA[
<p>If I can play devils advocate in favor of public disinterest about these events, I think you can argue that cybersecurity doesn't really matter, in the grand scheme of things. At least data exfiltration.<p>What would the consequences for humanity be if every single electronic patient record was leaked onto the internet? Immediately hugely bad for some groups, unfortunately. After a good deal of embarrassment and drama however, some severe, perhaps the net effect is positive. It would most likely facilitate a lot of scientific inquiry. A lot of people, especially in medical deserts, also use Chatgpt as an md. Providing AI companies with high quality medical data is actually a public service.<p>So it goes for many things in life, and except for financial and destructive wipe attacks, data security is mostly about protecting the IP of incumbents, which is somewhere between irrelevant and a net negative. It's hard to say what the long term consequences of the IP system breaking down would be, but there is a good argument to be made that it's not necessarily bad.<p>As for individual people, most don't really care or are resigned to the fact that Google already knows everything about them, and probably abstractly enjoy the fact that a major company gets brought down to their reality. Plenty of societies have extremely collectivistic mindsets of public info being shared, like Scandinavian countries having public tax filings, and they work just fine.<p>I think most people would secretly relish the outcomes of everything leaking everywhere. Just like people relish the Epstein files being released, and probably would have loved an unredacted version being leaked. Secrets are something human beings naturally gravitate towards to dig up and sharing, and this is actually for good, sensible reasons. Evolution has simply favored groups that did not hoard knowledge, at least not internally. There is a reason the scientific method has openness as a virtue, and is arguably one of the pillars that has carried humanity out of the dark ages.</p>
]]></description><pubDate>Mon, 13 Apr 2026 16:47:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47754735</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47754735</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47754735</guid></item><item><title><![CDATA[New comment by stalfie in "US appeals court declares 158-year-old home distilling ban unconstitutional"]]></title><description><![CDATA[
<p>One problem with this mentality is that reality doesn't really make the ideological distinction between whats private and what isn't, or who pays for what. Healthcare is not an intersubjective field, and so actions have consequences, no matter what you think about them.<p>Vaccines are a good example of this, herd immunity is needed for many of them to work. Antibiotic stewardship is another, unregulated usage of antibiotics risks breeding superbugs.<p>More generally, "private" ideas are rarely private. Kids born to idiots practicing  alternative medicine often die. This scales to societal effects if you have enough idiots. Even though capitalism makes this very fuzzy, many resources in medicine are in fact finite, meaning that time and money spent on one person might mean that another dies. Sometimes that other person is in another, usually poorer country. COVID vaccine availability illustrated that effect nicely.<p>Essentially what you are advocating is widespread natural selection, with potential consequences affecting anywhere from small local communities to the entire planet in rare cases (COVID is a good one, look up Trichophyton Indotineae for a recent example). And even if you actually do want that, unless you truly follow through, this also comes a huge amount of waste of very limited resources. That is unless you are willing to go the distance and advocate that unvaccinated kids with pneumonia from a measles infection should just go ahead and die because of their parents or neighbors stupid choices.<p>If you take Kants approach to ethics, that you should only act on principles that you would want to become a universal law, then the principle of healthcare being a private matter is a bit of a non-starter, at least by most ethical systems.</p>
]]></description><pubDate>Mon, 13 Apr 2026 15:15:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=47753224</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47753224</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47753224</guid></item><item><title><![CDATA[New comment by stalfie in "Peptides: where to begin?"]]></title><description><![CDATA[
<p>Well, to be honest I think the primary disconnect is in epistemological understanding. The OP did not declare peptides to be a personal revolution, he/she seemingly generalised their own experience to be widely applicable.<p>Basic human thought patterns usually lead people to think that anecdotes about their personal experience is valuable for understanding the world, but this is wrong. The scientific revolution basically illustrated the flaw in this premise outside of hypothesis generation. It takes specific education to make human beings truly believe that their anecdotal experiences are mostly irrelevant beyond understanding their immediate circumstances. The proportion of humanity that truly think this way is relatively small.<p>Understanding the world through anecdotes still works okay-ish for a lot of areas, but ascertaining relatively subjective effects of experimental pharmaceuticals is not one of them. But to many people it's non obvious that this is the case. And as a general method of thinking about this issue, it is just the wrong way to go about things.<p>And that's the disconnect, in my opinion. The OP drew a conclusion from a thought pattern that comes easily to human beings, but that is just wrong in this situation. Of course, perhaps this is reinforced by underlying motivations, but that's not what makes people talk past each other. These kinds of discussion are usually driven by so called "deep disagreements" in epistemological understanding, in my experience.</p>
]]></description><pubDate>Wed, 08 Apr 2026 13:18:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47689833</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47689833</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47689833</guid></item><item><title><![CDATA[New comment by stalfie in "System Card: Claude Mythos Preview [pdf]"]]></title><description><![CDATA[
<p>I don't think that's what's being hinted at. The system card seems to say that the model is both token efficient and slow in practice. Deep research modes generally work by having many subagents/large token spend. So this more likely the fact that each token just takes longer to produce, which would be because the model is simply much larger.<p>By epoch AIs datacenter tracking methods, anthropic has had access to the largest amount of contiguous compute since late last year. So this might simply be the end result result of being the first to have the capacity to conduct a training run of this size. Or the first seemingly successful one at any rate.</p>
]]></description><pubDate>Wed, 08 Apr 2026 12:00:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=47689005</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47689005</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47689005</guid></item><item><title><![CDATA[New comment by stalfie in "Peptides: where to begin?"]]></title><description><![CDATA[
<p>Non blinded self experimentation is not a useful branch of empiricism.<p>I had an ME/CFS patient that had tried 100s of things and documented the effects thoroughly. She had a quite impressive list. Roughly 30% had had an effect to begin with, but the trend she observed was that it lasted for around a month at most. Placebo was her overall conclusion, but she occasionally got relief anyways so we both agreed that there was no harm in continuing. I'm sure several "peptides" is on her list by now.<p>There is nothing new under the sun, and fad cures for diffuse conditions have come and gone many times before. This is especially the case for conditions involving pain or tiredness, which are extremely sensitive to both placebo and nocebo.<p>What would be revolutionary would be 2-3 double blinded RCTs showing a lasting effect. Which would be great if someone did! But you have to actually bother to do it. And personally I would put money on the outcome being "no effect".</p>
]]></description><pubDate>Tue, 07 Apr 2026 07:33:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47671868</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47671868</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47671868</guid></item><item><title><![CDATA[New comment by stalfie in "Further human + AI + proof assistant work on Knuth's "Claude Cycles" problem"]]></title><description><![CDATA[
<p>The blind spot exploiting strategy you link to was found by an adverserial ML model...</p>
]]></description><pubDate>Sun, 29 Mar 2026 11:19:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=47562185</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47562185</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47562185</guid></item><item><title><![CDATA[New comment by stalfie in "You are not your job"]]></title><description><![CDATA[
<p>Late reply, but in case you'll check I think most of what I said is sourced from sources of varying quality and salience, but at least it's sourced from somewhere. But I just typed it all out quickly without checking anything over, so a lot might be wrong. But it's not entirely pulled out of my ass at least.<p>Evolutionary history is of course always difficult. I think the loneliness part comes mostly from the kurzgesagt video on loneliness, as well as some other stuff here and there. Rate of infanticide is roughly correct with quick Google. Rest of tribal stuff is from a variety of books and high school social anthropology. I think I actually have the "reasoning for infanticide" part from sex at dawn, of all places.<p>I'm always scared to run a deep research service to find the counterpoints after I type this kind of stuff out, but feel free to do so for me and dress me down. At least survivorship bias is a classic that's pretty much always worth keeping in mind on any topic.</p>
]]></description><pubDate>Thu, 26 Mar 2026 22:53:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47536881</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47536881</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47536881</guid></item><item><title><![CDATA[New comment by stalfie in "ARC-AGI-3"]]></title><description><![CDATA[
<p>Oookay. I actually tried the harness myself, and there was a visual option. It is unclear to me if that is what the models are using on the official benchmark, but it probably is. This probably means that much of my critique is invalid. However, in the process of fiddling with the harness, building a live viewer to see what was happening, and playing through the agent API myself, I might have found 3-4 bugs with the default harness/API. Dunno where to post it, so of all places I am documenting the process on HN.<p>Bug 1: The visual mode "diff" image is always black, even if the model clicked on an interactive element and there was a change. Codex fixed it in one shot, the problem was in the main session loop at agent.py (line 458).<p>Bug 2: Claude and Chatgpt can't see the 128x128 pixel images clearly, and cannot or accurately place clicks on them either. Scaling up the images to 1028x1028 pixels gave the best results, claude dropped off hard at 2048 for some reason. Here are the full test results when models were asked to hit specific (manually labeled) elements on the "vc 33" level 1 (upper blue square, lower blue square, upper yellow rectangle, lower yellow rectangle):<p>Model | 128 | 256 | 512 | 1024 | 2048<p>claude-opus-4-6 | 1/10 | 1/10 | 9/10 | 10/10 | 0/10<p>gemini-3-1-pro-preview | 10/10 | 10/10 | 10/10 | 10/10 | 10/10<p>gpt-5.4-medium | 4/10 | 8/10 | 9/10 | 10/10 | 8/10<p>Bug 3: "vc 33" level 4 is impossible to complete via the API. At least it was when I made a web-viewer to navigate the games from the API side. The "canal lock" required two clicks instead of one to transfer the "boat" when water level were equilibriated, and after that any action whatsoever would spontaneously pop the boat back to the first column, so you could never progress.<p>"Bug" 4: This is more of a complaint on the models behalf. A major issue is that the models never get to know where they clicked. This is truly a bit unfair since humans get a live update of the position of their cursor at no extra cost (even a preview of the square their cursor highlights in the human version), but models if models fuck up on the coordinates they often think they hit their intended targets even though they whiffed the coordinates. So if that happens they note down "I hit the blue square but I guess nothing happened", and for the rest of the run they are fucked because they conclude the element is not interactive even though they got it right on the first try. The combination of an intermediary harness layer that let the models "preview" their cursor position before the "confirmed" their action and the 1024x1024 resolution caused a major improvement in their intended action "I want to click the blue square" actually resulting in that action. However, even then unintended miss-clicks often spell the end of a run (Claude 4.6 made it the furthest, which means level 2 of the "vc 33" stages, and got stuck when it missed a button and spent too much time hitting other things)<p>After I tried to fix all of the above issues, and tried to set up an optimal environment for models to get a fair shake, the models still mostly did very badly even when they identified the right interactive elements...except for Claude 4.6 Opus! Claude had at least one run where it made it to level 4 on "vc 33", but then got stuck because the blue squares it had to hit became too small, and it just couldn't get the cursor in the right spot even with the cursor preview functionality (the guiding pixel likely became too small for it to see clearly). When you read through the reasoning for the previous stages though, it didn't truly fully understand the underlying logic of the game, although it was almost there.</p>
]]></description><pubDate>Thu, 26 Mar 2026 20:37:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47535395</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47535395</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47535395</guid></item><item><title><![CDATA[New comment by stalfie in "ARC-AGI-3"]]></title><description><![CDATA[
<p>I just realized that this also means that the benchmark is in practice unverified by third parties, as all tasks are not verified to be solvable through the JSON interface. Essentially there is no guarantee that it is even possible to understand how to complete every task optimally through the JSON interface alone.<p>I assume you did not develop the puzzles by visualizing JSON yourselves, and so there might be non obvious information that is lost in translation to JSON. Until humans optimally solve all the puzzles without ever having seen the visual version, there is no guarantee that this is even possible to do.<p>I think the only viable solution here is to release a version of the benchmark with a vision only harness. Otherwise it is impossible to interpret what LLM progress on this benchmark actually means.</p>
]]></description><pubDate>Thu, 26 Mar 2026 12:15:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47529532</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47529532</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47529532</guid></item><item><title><![CDATA[New comment by stalfie in "ARC-AGI-3"]]></title><description><![CDATA[
<p>This counterpoint doesn't address the issue, and I would argue that it is partially bad faith.<p>Yes, making it to the test center is significantly harder, but in fact the humans could have solved it from their home PC instead, and performed the exact same. However, if they were given the same test as the LLMs, forbidden from input beyond JSON, they would have failed. And although buying robots to do the test is unfeasible, giving LLMs a screenshot is easy.<p>Without visual input for LLMs in a benchmark that humans are asked to solve visually, you are not comparing apples to apples. In fact, LLMs are given a different and significantly harder task, and in a benchmark that is so heavily weighted against the top human baseline, the benchmark starts to mean something extremely different. Essentially, if LLMs eventually match human performance on this benchmark, this will mean that they in fact exceed human performance by some unknown factor, seeing as human JSON performance is not measured.<p>Personally, this hugely decreased my enthusiasm for the benchmark. If your benchmark is to be a North star to AGI, labs should not be steered towards optimizing superhuman JSON parsing skills. It is much more interesting to steer them towards visual understanding, which is what will actually lead the models out into the world.</p>
]]></description><pubDate>Thu, 26 Mar 2026 08:46:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47528037</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47528037</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47528037</guid></item><item><title><![CDATA[New comment by stalfie in "You are not your job"]]></title><description><![CDATA[
<p>This is incredibly naive. Hunter gatherer communities, especially those in regions without an abundance of food, are and were extremely selective about who were accepted and who weren't. This starts from infancy where non-desirable babies were simply killed. Estimates vary greatly but perhaps around a third to half of "modern" hunter gatherer tribes practice infanticide. The stated reasoning behind infanticides is often extremely vicious and comes down to "he/she is not a good fit for the tribe", or in other words "nobody likes him/her". This fact alone might be one of the major explanations of the high rate of prehistoric infant mortality.<p>But if you are even allowed to grow up and become an individual, things might be somewhat better once you are part of the in-group, but that does not factor in the fact that human empathy has an overall tendency to switch off if you're not. Even if you're loved because you're kin, your neighboring tribe might still kill you, or you and your kin might kill them, for entirely petty or cynical reasons. The prehistoric bone record supports this as well, seemingly human-weapon related reasons is the most common cause of death.<p>You can also examine your own emotions to get some idea of our evolutionary environment. Loneliness hurts, to the point where it has measurable negative health impacts equivalent to smoking a pack of cigarettes each day. Your brain is screaming at you not to be lonely, but why? Well, in our ancestral environment, being excluded from the social group meant death, so most individuals that did not have a profound and visceral fear of that happening got their genes consistently removed from the gene pool. For loneliness to be that big of deal, being excluded must have been an easily available option. If everyone loved and accepted everyone unconditionally, this emotional state would simply not have evolved.<p>Humans quickly become extremely brutal once the environment necessitates it, up to and including cannibalizing your own kin. Infanticide and murder of both ingroups and outgroups is historically commonplace because it was also commonplace prehistorically. Even modern tribes, that live in relative abundance, are still brutal in many ways to this very day.<p>But of course, when you look at any group of individuals in a tribe survivorship bias will dictate that it all looks nice and rosy. But you might want to check the skeletons in the cave before you pick that as your conclusion.</p>
]]></description><pubDate>Mon, 23 Mar 2026 13:02:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47488917</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47488917</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47488917</guid></item><item><title><![CDATA[New comment by stalfie in "Reports of code's death are greatly exaggerated"]]></title><description><![CDATA[
<p>Well, to be fair, judging by the shift in the general vibes of the average HN comment over the past 3 years, better use of agents and advanced models DID solve the previous temporary setbacks. The techno-optimists were right, and the nay-sayers wrong.<p>Over the course of about 2 years, the general consensus has shifted from "it's a fun curiosity" to "it's just better stackoverflow" to "some people say it's good" to "well it can do some of my job, but not most of it". I think for a lot of people, it has already crossed into "it can do most of my job, but not all of it" territory.<p>So unless we have finally reached the mythical plateau, if you just go by the trend, in about a year most people will be in the "it can do most of my job but not all" territory, and a year or two after that most people will be facing a tool that can do anything they can do. And perhaps if you factor in optimisation strategies like the Karpathy loop, a tool that can do everything but better.<p>Upper managment might be proven right.</p>
]]></description><pubDate>Sun, 22 Mar 2026 19:17:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=47481033</link><dc:creator>stalfie</dc:creator><comments>https://news.ycombinator.com/item?id=47481033</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47481033</guid></item></channel></rss>