Hacker News: puppystench

New comment by puppystench in "An update on recent Claude Code quality reports"

puppystench — Thu, 23 Apr 2026 19:14:12 +0000

The Claude UI still only has "adaptive" reasoning for Opus 4.7, making it functionally useless for scientific/coding work compared to older models (as Opus 4.7 will randomly stop reasoning after a few turns, even when prompted otherwise). There's no way this is just a bug and not a choice to save tokens.

New comment by puppystench in "GPT-5.5"

puppystench — Thu, 23 Apr 2026 19:09:27 +0000

In the announcement webpage:

>For API developers, gpt-5.5 will soon be available in the Responses and Chat Completions APIs at $5 per 1M input tokens and $30 per 1M output tokens, with a 1M context window.

New comment by puppystench in "GPT-5.5"

puppystench — Thu, 23 Apr 2026 19:03:38 +0000

For API usage, GPT-5.5 is 2x the price of GPT-5.4, ~4x the price of GPT-5.1, and ~10x the price of Kimi-2.6.

Unfortunately I think the lesson they took from Anthropic is that devs get really reliant and even addicted on coding agents, and they'll happily pay any amount for even small benefits.

New comment by puppystench in "Claude Opus 4.7"

puppystench — Thu, 16 Apr 2026 16:27:38 +0000

Does this mean Claude no longer outputs the full raw reasoning, only summaries? At one point, exposing the LLM's full CoT was considered a core safety tenet.

New comment by puppystench in "Claude mixes up who said what"

puppystench — Thu, 09 Apr 2026 16:41:45 +0000

I believe you're right, it's an issue of the model misinterpreting things that sound like user message as actual user messages. It's a known phenomenon: https://arxiv.org/abs/2603.12277

New comment by puppystench in "Claude mixes up who said what"

puppystench — Thu, 09 Apr 2026 15:49:58 +0000

>Several people questioned whether this is actually a harness bug like I assumed, as people have reported similar issues using other interfaces and models, including chatgpt.com. One pattern does seem to be that it happens in the so-called “Dumb Zone” once a conversation starts approaching the limits of the context window.

I also don't think this is a harness bug. There's research* showing that models infer the source of text from how it sounds, not the actual role labels the harness would provide. The messages from Claude here sound like user messages ("Please deploy") rather than usual Claude output, which tricks its later self into thinking it's from the user.

*https://arxiv.org/abs/2603.12277

Presumably this is also why prompt innjection works at all.