Hacker News: suttontom

New comment by suttontom in "Anthropic apologizes for invisible Claude Fable guardrails"

suttontom — Fri, 12 Jun 2026 18:40:23 +0000

You're wrong in lots of ways.

Some model cards do show regressions on benchmarks for newer models on specific tasks: https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...

This wasn't a new model but updates to models backed by numbers being better can make the model worse: https://openai.com/index/sycophancy-in-gpt-4o/

The slight increases in performance/benchmarks may be just noise: https://arxiv.org/pdf/2602.07150

New comment by suttontom in "Workers are spending over 6 hours a week botsitting AI, fueling job frustration"

suttontom — Fri, 12 Jun 2026 02:00:09 +0000

I wouldn't agree with that. The issue with software is that the people you make things for are usually anonymous and you'll never meet them, but if you've ever built software that helped someone and you witnessed it, it feels really good.

New comment by suttontom in "LLMs are eroding my software engineering career and I don't know what to do"

suttontom — Sun, 07 Jun 2026 18:03:38 +0000

This is such a tired, meaningless argument. I've never seen a human in 10 years of professional software engineering at a large company ever so confidently, consistently create and send out seemingly well-reasoned code that's as wrong as what SOTA models using CC or Codex do. If a human did this, they would be fired or perpetually remain a junior who no one wants to work with.

Also, if a human does this, you can replace them and get a human who will not do it. The default for an LLM is to generate plausible-looking text that may or may not be completely incoherent. That is not the default for a human. Again, if you find that your colleague consistently fabricates APIs, you can hire someone who isn't crazy instead, but you cannot do the same with LLMs.

New comment by suttontom in "LLMs are eroding my software engineering career and I don't know what to do"

suttontom — Sun, 07 Jun 2026 17:53:39 +0000

This is commonly known as "LLM-as-a-judge" and anecdotally multiple people I know who write code using OpenRouter or using multiple models say it's surprisingly effective. It's strange that there don't appear to be any major papers on it since ~early 2025, which at this point is basically ancient history.

New comment by suttontom in "LLMs are eroding my software engineering career and I don't know what to do"

suttontom — Sun, 07 Jun 2026 17:10:11 +0000

Ah yes, the magical equivalent of "you are a senior software engineer who writes bug-free code".

IME people would benefit greatly from the process, albeit tedious and time-consuming, of testing out the same prompt sequence/session with the exact same model multiple times. It becomes clear extremely quickly how capable but unreliable and inconsistent a model can be even when given the same context. If you have ever completed a long, complicated task with an agent and then lost the session and tried doing the same thing again from scratch you may have had the experience of seeing the subtle changes that come up in the model's thinking which lead it to accept or reject certain paths and ignore or incorporate prompt instructions like the one you've provided.

New comment by suttontom in "Expanding Project Glasswing"

suttontom — Tue, 02 Jun 2026 19:04:29 +0000

Isn't that kind of what they're doing with this rollout? Except they're just hand picking the companies.

New comment by suttontom in "Can we have the day off?"

suttontom — Sun, 31 May 2026 02:18:52 +0000

What is your problem? Do you think something is an opinion piece just because it has a byline? What about https://www.forrester.com/press-newsroom/forrester-impact-ai...? Is there literally any evidence you'd accept?

You know companies lie and overstate things, right?

New comment by suttontom in "Claude Opus 4.8"

suttontom — Thu, 28 May 2026 18:43:56 +0000

Do you know if anyone has trained, say, a pre-2017 model and tried to get it to come up with Attention Is All You Need? If it did, would you say that was only because it's a synthesis of prior art? If so, what isn't?

New comment by suttontom in "Claude Opus 4.8"

suttontom — Thu, 28 May 2026 18:38:44 +0000

Are you joking? Is there literally "nothing" you can imagine that Claude can't do?

New comment by suttontom in "Can we have the day off?"

suttontom — Thu, 28 May 2026 18:31:01 +0000

https://www.shrm.org/topics-tools/news/technology/ai-layoffs...

https://cmr.berkeley.edu/2025/10/seven-myths-about-ai-and-pr...

https://www.technologyreview.com/2026/05/26/1137855/a-realit...

You?

New comment by suttontom in "The worst job interview I ever had"

suttontom — Thu, 28 May 2026 02:18:52 +0000

This is a good example of being bad at writing code.

New comment by suttontom in "Training our own AI models"

suttontom — Thu, 28 May 2026 01:58:09 +0000

Not to be cynical but do you think this would matter at all? Are you saying that companies would hold themselves to their missions or even something that's legally binding?

> "Google is not a conventional company. We do not intend to become one."

> OpenAI being founded as a nonprofit and becoming for profit.

> Didn't Anthropic literally say they wouldn't train on your data or keep it for longer than 30 days unless legally required, and then decided to opt people in to having their conversations used for training?

New comment by suttontom in "The just-say-no engineer was a ZIRP phenomenon"

suttontom — Thu, 28 May 2026 01:42:26 +0000

Am I going crazy? Is a PR with 94 commits that adds 1,600 LoC actually considered "very reviewable"? Please someone tell me if I'm crazy?

New comment by suttontom in "Constraint Decay: The Fragility of LLM Agents in Back End Code Generation"

suttontom — Tue, 26 May 2026 06:03:28 +0000

Models are not innately backwards-compatible. Both OpenAI and Anthropic encourage running evaluations and comparing the performance of your existing agent workflows against new models before just stepping up to the newest one because you may encounter regressions. I myself have seen lengthy/long-horizon multi-agent workflows begin breaking after moving to a newer model because for some reason the prompt containing an instruction to call a tool that worked 99/100 times before suddenly just stops working and needs to be modified.

New comment by suttontom in "Gemini Omni"

suttontom — Wed, 20 May 2026 18:45:07 +0000

I think LLMs are extremely useful, mostly for coding. But saying we're extremely close to an AI that can "reliably come up with novel actions for physical robots" feeds into the hype that these tools can do way or are very close to doing more than they're actually capable of, especially when we talk about reliability. That's the kind of rhetoric that has partially created this bubble, because in no world is what you're saying realistic.

The worst thing is when someone cites a video or a demo of an AI doing something and says, "See! It's here!" Remember when the Devin video came out years ago?

You can say "eventually" AI will be able to do xyz, but eventually the sun will blow up, too, so what the fuck are we talking about?

New comment by suttontom in "Incident Report: Railway Blocked by Google Cloud [resolved]"

suttontom — Wed, 20 May 2026 05:58:09 +0000

"UniSuper’s production Google Cloud VMware Engine (GCVE) private cloud was automatically deleted one year after it’s creation due to a misconfiguration in how it was created. When it was created, there was a bug in the creation script which passed a null value."

That's pretty amazing. Not due to a cascading failure from someone changing a config deep inside of a system that caused a bunch of unintended effects, just someone who messed up writing a shell script?

New comment by suttontom in "Gemini Omni"

suttontom — Wed, 20 May 2026 05:49:52 +0000

They can't even reliably follow instructions from text. I think "it's just around the corner/just wait x months/just wait and see bro" is one of the most telling signs of AI psychosis.

New comment by suttontom in "Google I/O"

suttontom — Tue, 19 May 2026 22:56:21 +0000

You do know that this was the same thing people said about crypto, right? And that the internet of things where your fridge connects to the Internet is hated by most consumers and had nowhere near the impact that IoT evangelists said it would?

New comment by suttontom in "Google I/O"

suttontom — Tue, 19 May 2026 22:52:34 +0000

This is such a creepy dystopian thing to say. Don't you realize that? Isn't this "yes, there will be pain, but the future is inevitable and we must go forward into it" attitude straight out of multiple horror and sci-fi stories?

Edit: Nevermind, parent is an LLM/bot.

New comment by suttontom in "Microsoft AI CEO forecasts human-level AI in 18 months"

suttontom — Mon, 18 May 2026 03:03:08 +0000

Demos are also often misleading and cherry-picked. Using AI to do one cool demo that breaks down 99% of the time when circumstances slightly change has played an outsized part in most of the AI insanity we are living with.