<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: suttontom</title><link>https://news.ycombinator.com/user?id=suttontom</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 15 Jun 2026 12:27:49 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=suttontom" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by suttontom in "Anthropic apologizes for invisible Claude Fable guardrails"]]></title><description><![CDATA[
<p>You're wrong in lots of ways.<p>Some model cards do show regressions on benchmarks for newer models on specific tasks: <a href="https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf" rel="nofollow">https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...</a><p>This wasn't a new model but updates to models backed by numbers being better can make the model worse: <a href="https://openai.com/index/sycophancy-in-gpt-4o/" rel="nofollow">https://openai.com/index/sycophancy-in-gpt-4o/</a><p>The slight increases in performance/benchmarks may be just noise: <a href="https://arxiv.org/pdf/2602.07150" rel="nofollow">https://arxiv.org/pdf/2602.07150</a></p>
]]></description><pubDate>Fri, 12 Jun 2026 18:40:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48507827</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48507827</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48507827</guid></item><item><title><![CDATA[New comment by suttontom in "Workers are spending over 6 hours a week botsitting AI, fueling job frustration"]]></title><description><![CDATA[
<p>I wouldn't agree with that. The issue with software is that the people you make things for are usually anonymous and you'll never meet them, but if you've ever built software that helped someone and you witnessed it, it feels really good.</p>
]]></description><pubDate>Fri, 12 Jun 2026 02:00:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=48498946</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48498946</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48498946</guid></item><item><title><![CDATA[New comment by suttontom in "LLMs are eroding my software engineering career and I don't know what to do"]]></title><description><![CDATA[
<p>This is such a tired, meaningless argument. I've never seen a human in 10 years of professional software engineering at a large company ever so confidently, consistently create and send out seemingly well-reasoned code that's as wrong as what SOTA models using CC or Codex do. If a human did this, they would be fired or perpetually remain a junior who no one wants to work with.<p>Also, if a human does this, you can replace them and get a human who will not do it. The default for an LLM is to generate plausible-looking text that may or may not be completely incoherent. That is not the default for a human. Again, if you find that your colleague consistently fabricates APIs, you can hire someone who isn't crazy instead, but you cannot do the same with LLMs.</p>
]]></description><pubDate>Sun, 07 Jun 2026 18:03:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=48437196</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48437196</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48437196</guid></item><item><title><![CDATA[New comment by suttontom in "LLMs are eroding my software engineering career and I don't know what to do"]]></title><description><![CDATA[
<p>This is commonly known as "LLM-as-a-judge" and anecdotally multiple people I know who write code using OpenRouter or using multiple models say it's surprisingly effective. It's strange that there don't appear to be any major papers on it since ~early 2025, which at this point is basically ancient history.</p>
]]></description><pubDate>Sun, 07 Jun 2026 17:53:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48437120</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48437120</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48437120</guid></item><item><title><![CDATA[New comment by suttontom in "LLMs are eroding my software engineering career and I don't know what to do"]]></title><description><![CDATA[
<p>Ah yes, the magical equivalent of "you are a senior software engineer who writes bug-free code".<p>IME people would benefit greatly from the process, albeit tedious and time-consuming, of testing out the same prompt sequence/session with the exact same model multiple times. It becomes clear extremely quickly how capable <i>but unreliable and inconsistent</i> a model can be even when given the same context. If you have ever completed a long, complicated task with an agent and then lost the session and tried doing the same thing again from scratch you may have had the experience of seeing the subtle changes that come up in the model's thinking which lead it to accept or reject certain paths and ignore or incorporate prompt instructions like the one you've provided.</p>
]]></description><pubDate>Sun, 07 Jun 2026 17:10:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=48436747</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48436747</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48436747</guid></item><item><title><![CDATA[New comment by suttontom in "Expanding Project Glasswing"]]></title><description><![CDATA[
<p>Isn't that kind of what they're doing with this rollout? Except they're just hand picking the companies.</p>
]]></description><pubDate>Tue, 02 Jun 2026 19:04:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=48374710</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48374710</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48374710</guid></item><item><title><![CDATA[New comment by suttontom in "Can we have the day off?"]]></title><description><![CDATA[
<p>What is your problem? Do you think something is an opinion piece just because it has a byline? What about <a href="https://www.forrester.com/press-newsroom/forrester-impact-ai-jobs-forecast" rel="nofollow">https://www.forrester.com/press-newsroom/forrester-impact-ai...</a>? Is there literally any evidence you'd accept?<p>You know companies lie and overstate things, right?</p>
]]></description><pubDate>Sun, 31 May 2026 02:18:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=48342485</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48342485</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48342485</guid></item><item><title><![CDATA[New comment by suttontom in "Claude Opus 4.8"]]></title><description><![CDATA[
<p>Do you know if anyone has trained, say, a pre-2017 model and tried to get it to come up with Attention Is All You Need? If it did, would you say that was only because it's a synthesis of prior art? If so, what isn't?</p>
]]></description><pubDate>Thu, 28 May 2026 18:43:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48313562</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48313562</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48313562</guid></item><item><title><![CDATA[New comment by suttontom in "Claude Opus 4.8"]]></title><description><![CDATA[
<p>Are you joking? Is there literally "nothing" you can imagine that Claude can't do?</p>
]]></description><pubDate>Thu, 28 May 2026 18:38:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=48313485</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48313485</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48313485</guid></item><item><title><![CDATA[New comment by suttontom in "Can we have the day off?"]]></title><description><![CDATA[
<p><a href="https://www.shrm.org/topics-tools/news/technology/ai-layoffs-transformation-scapegoat" rel="nofollow">https://www.shrm.org/topics-tools/news/technology/ai-layoffs...</a><p><a href="https://cmr.berkeley.edu/2025/10/seven-myths-about-ai-and-productivity-what-the-evidence-really-says/" rel="nofollow">https://cmr.berkeley.edu/2025/10/seven-myths-about-ai-and-pr...</a><p><a href="https://www.technologyreview.com/2026/05/26/1137855/a-reality-check-on-the-ai-jobs-hysteria/" rel="nofollow">https://www.technologyreview.com/2026/05/26/1137855/a-realit...</a><p>You?</p>
]]></description><pubDate>Thu, 28 May 2026 18:31:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=48313374</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48313374</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48313374</guid></item><item><title><![CDATA[New comment by suttontom in "The worst job interview I ever had"]]></title><description><![CDATA[
<p>This is a good example of being bad at writing code.</p>
]]></description><pubDate>Thu, 28 May 2026 02:18:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=48303583</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48303583</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48303583</guid></item><item><title><![CDATA[New comment by suttontom in "Training our own AI models"]]></title><description><![CDATA[
<p>Not to be cynical but do you think this would matter at all? Are you saying that companies would hold themselves to their missions or even something that's legally binding?<p>> "Google is not a conventional company. We do not intend to become one."<p>> OpenAI being founded as a nonprofit and becoming for profit.<p>> Didn't Anthropic literally say they wouldn't train on your data or keep it for longer than 30 days unless legally required, and then decided to opt people in to having their conversations used for training?</p>
]]></description><pubDate>Thu, 28 May 2026 01:58:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=48303426</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48303426</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48303426</guid></item><item><title><![CDATA[New comment by suttontom in "The just-say-no engineer was a ZIRP phenomenon"]]></title><description><![CDATA[
<p>Am I going crazy? Is a PR with 94 commits that adds 1,600 LoC actually considered "very reviewable"? Please someone tell me if I'm crazy?</p>
]]></description><pubDate>Thu, 28 May 2026 01:42:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=48303289</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48303289</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48303289</guid></item><item><title><![CDATA[New comment by suttontom in "Constraint Decay: The Fragility of LLM Agents in Back End Code Generation"]]></title><description><![CDATA[
<p>Models are not innately backwards-compatible. Both OpenAI and Anthropic encourage running evaluations and comparing the performance of your existing agent workflows against new models before just stepping up to the newest one because you may encounter regressions. I myself have seen lengthy/long-horizon multi-agent workflows begin breaking after moving to a newer model because for some reason the prompt containing an instruction to call a tool that worked 99/100 times before suddenly just stops working and needs to be modified.</p>
]]></description><pubDate>Tue, 26 May 2026 06:03:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=48275610</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48275610</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48275610</guid></item><item><title><![CDATA[New comment by suttontom in "Gemini Omni"]]></title><description><![CDATA[
<p>I think LLMs are extremely useful, mostly for coding. But saying we're extremely close to an AI that can "reliably come up with novel actions for physical robots" feeds into the hype that these tools can do way or are very close to doing more than they're actually capable of, especially when we talk about reliability. That's the kind of rhetoric that has partially created this bubble, because in no world is what you're saying realistic.<p>The worst thing is when someone cites a video or a demo of an AI doing something and says, "See! It's here!" Remember when the Devin video came out years ago?<p>You can say "eventually" AI will be able to do xyz, but eventually the sun will blow up, too, so what the fuck are we talking about?</p>
]]></description><pubDate>Wed, 20 May 2026 18:45:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48212193</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48212193</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48212193</guid></item><item><title><![CDATA[New comment by suttontom in "Incident Report: Railway Blocked by Google Cloud [resolved]"]]></title><description><![CDATA[
<p>"UniSuper’s production Google Cloud VMware Engine (GCVE) private cloud was automatically deleted one year after it’s creation due to a misconfiguration in how it was created. When it was created, there was a bug in the creation script which passed a null value."<p>That's pretty amazing. Not due to a cascading failure from someone changing a config deep inside of a system that caused a bunch of unintended effects, just someone who messed up writing a shell script?</p>
]]></description><pubDate>Wed, 20 May 2026 05:58:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=48203645</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48203645</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48203645</guid></item><item><title><![CDATA[New comment by suttontom in "Gemini Omni"]]></title><description><![CDATA[
<p>They can't even reliably follow instructions from text. I think "it's just around the corner/just wait x months/just wait and see bro" is one of the most telling signs of AI psychosis.</p>
]]></description><pubDate>Wed, 20 May 2026 05:49:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=48203591</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48203591</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48203591</guid></item><item><title><![CDATA[New comment by suttontom in "Google I/O"]]></title><description><![CDATA[
<p>You do know that this was the same thing people said about crypto, right? And that the internet of things where your fridge connects to the Internet is hated by most consumers and had nowhere near the impact that IoT evangelists said it would?</p>
]]></description><pubDate>Tue, 19 May 2026 22:56:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48200774</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48200774</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48200774</guid></item><item><title><![CDATA[New comment by suttontom in "Google I/O"]]></title><description><![CDATA[
<p>This is such a creepy dystopian thing to say. Don't you realize that? Isn't this "yes, there will be pain, but the future is inevitable and we must go forward into it" attitude straight out of multiple horror and sci-fi stories?<p>Edit: Nevermind, parent is an LLM/bot.</p>
]]></description><pubDate>Tue, 19 May 2026 22:52:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=48200751</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48200751</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48200751</guid></item><item><title><![CDATA[New comment by suttontom in "Microsoft AI CEO forecasts human-level AI in 18 months"]]></title><description><![CDATA[
<p>Demos are also often misleading and cherry-picked. Using AI to do one cool demo that breaks down 99% of the time when circumstances slightly change has played an outsized part in most of the AI insanity we are living with.</p>
]]></description><pubDate>Mon, 18 May 2026 03:03:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=48175151</link><dc:creator>suttontom</dc:creator><comments>https://news.ycombinator.com/item?id=48175151</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48175151</guid></item></channel></rss>