Hacker News: mesmertech

New comment by mesmertech in "Fable 5 remotion video benchmark and examples"

mesmertech — Tue, 09 Jun 2026 21:36:12 +0000

Overall an improvement over Opus 4.8, but I'd still say Gemini 3.1 Pro has more of an artistic vision even tho it fails tool calls and writes buggy code sometimes.

Ik almost everyone is interested just in the SWE stuff, but this has been a good eval for me to think about how big the model is, how "creative" it is for generating new ideas etc.

More results from fable, with comparisons for Gemini, opus and some open source models: https://mesmer.tools/benchmarks/ai-video-generation

Fable 5 remotion video benchmark and examples

mesmertech — Tue, 09 Jun 2026 21:36:12 +0000

Article URL: https://mesmer.tools/benchmarks/ai-video-generation

Comments URL: https://news.ycombinator.com/item?id=48468095

Points: 5

# Comments: 1

Ask HN: Anyone else seeing serious degradation in DX with Opus 4.8?

mesmertech — Mon, 01 Jun 2026 12:40:05 +0000

As an anthropic fan boy(check my prev. comments), this is the first opus release where I feel like the model is just not pleasant to talk to not to mention untrustworthy.

The two examples for me where I lost confidence in it is once where it started with 2 random echo commands: https://snipboard.io/tpqfP2.jpg Another where I asked it to create a new landing page and it just deleted an existing app page as a replacement.

I'm not exactly sure if this is a harness problem with the claude code updates(maybe system prompt changed) or if its just the model itself that has gotten too "safety-pilled" as I've been seeing similar opinions where devs are complaining about the fact that the model seems to distrust the user's intentions.

Either way, this is the first model release where I've downgraded to previous model since I was already pretty happy with it before. Should make it clear if its a model problem or the harness

Comments URL: https://news.ycombinator.com/item?id=48356061

Points: 2

# Comments: 0

New comment by mesmertech in "Claude Opus 4.8"

mesmertech — Thu, 28 May 2026 17:24:55 +0000

I think gpt 5.6 is coming out today so might wanna wait

New comment by mesmertech in "Claude Opus 4.8"

mesmertech — Thu, 28 May 2026 17:23:57 +0000

/model claude-opus-4-8

seems to work but idk why they never set it so you can see it in the /model list.

"what model are you

I'm Claude Opus (claude-opus-4-8), running in Claude Code."

New comment by mesmertech in "I think Anthropic and OpenAI have found product-market fit"

mesmertech — Wed, 27 May 2026 22:01:12 +0000

Yep sorry was just pulling it out my rear, not like a market trend that nearly every enterprise uses Anthropic or Openai models for coding or that Anthropic has had such ridiculous growth that they're 10x-ing year over year

New comment by mesmertech in "I think Anthropic and OpenAI have found product-market fit"

mesmertech — Wed, 27 May 2026 21:52:42 +0000

My point was that even openrouter, the one place people who are looking for open source SOTA models go to, doesn't definitively have opensource models at the top. Esp considering quite a lot of the closed models usage is through AWS, GCP , Azure etc, probably dwarfing the usage on openrouter by a huge factor

New comment by mesmertech in "I think Anthropic and OpenAI have found product-market fit"

mesmertech — Wed, 27 May 2026 21:48:10 +0000

As long as closed source is 6 months ahead in terms of current difference. Although this is hard to figure out using simple percent based coding benchmarks, you def. notice it when you're actually trying to do a long task. Even simple things like UI "taste" is enough for me to use opus instead of 5.5 though even though 5.5 is strictly better for anything that doesn't have a UI, ie backend, scripts, making agent workflows etc

New comment by mesmertech in "I think Anthropic and OpenAI have found product-market fit"

mesmertech — Wed, 27 May 2026 21:44:43 +0000

As long as closed models are 6 months ahead I won't be switching from them to prev. 6 month SOTA open source models. Maybe its just a different calculation if you're in a job, but as an indiehacker I'll take any edge I can get

Ofc again, can be convinced to switch if there's however a clear speed difference, like 5x+ for a open source sota even if it was SOTA for 6 months ago

New comment by mesmertech in "I think Anthropic and OpenAI have found product-market fit"

mesmertech — Wed, 27 May 2026 21:40:35 +0000

Based on current market for LLMs I'd say my use of "you" in the general is fine. Even openrouter which doesn't capture all of the SOTA closed models but nearly all of opensource model usage has Opus as 1st(on last week) on "Programming" category and 3rd in overall rankings

https://openrouter.ai/rankings

New comment by mesmertech in "I think Anthropic and OpenAI have found product-market fit"

mesmertech — Wed, 27 May 2026 18:14:23 +0000

Cost for the value delivered. Like if you offered the current SOTA open source models at $0.1/M, I still think I'd be using Opus or 5.5 at $30/M. Or say GPT 5 which was released Aug 25, I don't think I'd use it for coding for even $0.1. I'd def find other uses for it(translations, agentic workflows, prompt guards etc), but for coding I don't think I'd ever completely switch to a SOTA open model

Unless ofc there was an actual speed difference, only reason I'd be willing to go with a worse model couple of percent worse than current best model is if the speed was at least 5x higher. Looking forward to kimi k2.6 offered publicly by Cerebras

New comment by mesmertech in "I think Anthropic and OpenAI have found product-market fit"

mesmertech — Wed, 27 May 2026 17:29:56 +0000

For coding you always want to go with the best model in the category, not something that would be the best model if we went 1 year back which GLM 5.1 is, and I'm saying that as a big fan of GLM cause I run a translation site where GLM is good enough for the price.

Most of the money right now is in coding. Openai and Anthropic just have to be 6 months ahead of SOTA open source models and they'll capture most of the enterprise and dev market

New comment by mesmertech in "I think Anthropic and OpenAI have found product-market fit"

mesmertech — Wed, 27 May 2026 17:27:45 +0000

If nothing else this blog did give me the idea that I should split my $200 claude max plan into two $100 CC max and $100 codex plan, esp because Claude is now offering 1.5x weekly limits so its the 5x usage is now more like 7.5x usage.

New comment by mesmertech in "Claude Code weekly limits increasing 50% till July 13"

mesmertech — Wed, 13 May 2026 19:43:18 +0000

I'm just hoping they release Mythos soon now that it seems like they have enough compute to do promotions like this

Claude Code weekly limits increasing 50% till July 13

mesmertech — Wed, 13 May 2026 19:38:12 +0000

Article URL: https://twitter.com/ClaudeDevs/status/2054639777685934564

Comments URL: https://news.ycombinator.com/item?id=48126429

Points: 10

# Comments: 9

New comment by mesmertech in "Claude Opus 4.7"

mesmertech — Thu, 16 Apr 2026 15:19:55 +0000

I think that was a typo on my end, its "/model claude-opus-4-7" not "/model claude-opus-4.7"

New comment by mesmertech in "Claude Opus 4.7"

mesmertech — Thu, 16 Apr 2026 15:06:40 +0000

I'm on the max $200 plan, so maybe its that?

New comment by mesmertech in "Claude Opus 4.7"

mesmertech — Thu, 16 Apr 2026 15:05:58 +0000

I think its just a visual/default thing, cause Opus 4.0 isn't offered on claude code anymore. And opus 4.7 is on their official docs as a model you can change to, on claude code

Just ask it what model it is(even in new chat).

what model are you?

I'm Claude Opus 4 (model ID: claude-opus-4-7).

https://support.claude.com/en/articles/11940350-claude-code-...

New comment by mesmertech in "Claude Opus 4.7"

mesmertech — Thu, 16 Apr 2026 14:50:50 +0000

Not showing up in claude code by default on the latest version. Apparently this is how to set it:

/model claude-opus-4-7

Coming from anthropic's support page, so hopefully they did't hallucinate the docs, cause the model name on claude code says:

/model claude-opus-4-7 ⎿ Set model to Opus 4

what model are you?

I'm Claude Opus 4 (model ID: claude-opus-4-7).

New comment by mesmertech in "Elevated errors on Claude.ai, API, Claude Code"

mesmertech — Wed, 15 Apr 2026 14:49:32 +0000

We went from "Peak hours" meaning 2x usage plus slower to now it just does 500 error

https://mesmer.tools/random/is-it-peak-hours