Hacker News: muyuu

New comment by muyuu in "Claude Opus 4.7"

muyuu — Thu, 16 Apr 2026 17:07:25 +0000

Currently GPT just works much better, and so does Gemini but it's more expensive right now. Going through Opencode stats, their claim is that Gemini is the current best model followed by GPT 5.4 on their benchmarks, but the difference is slim.

My personal experience is best with GPT but it could be the specific kind of work I use it for which is heavy on maths and cpp (and some LISP).

New comment by muyuu in "Token efficient source code representation using virtual filesystem"

muyuu — Thu, 16 Apr 2026 16:47:01 +0000

I don't use Claude Code either in fairness, but when I buy tokens from them that's an exposure I already have. These days I use opencode and I work local first with the models, then remote models as fallback. GPT and Gemini have been giving me better results than Opus for a while.

Would you install someone else's binary blob to give it those much permissions and your API keys? The measures required to run that somewhat trustless complicate things a lot.

New comment by muyuu in "Token efficient source code representation using virtual filesystem"

muyuu — Thu, 16 Apr 2026 12:08:13 +0000

sounds interesting and i don't want to sound negative, but i'm not going to install software of this nature that is closed source

New comment by muyuu in "Pro Max 5x quota exhausted in 1.5 hours despite moderate usage"

muyuu — Mon, 13 Apr 2026 06:31:41 +0000

I don't know about users on reddit and discord, but the open models are essentially at SotA with a 3-4 months delay. That puts a hard backstop at what OpenAI and Anthropic can do before I personally can cut them off entirely without losing too much.

Granted the experience can be worse, esp. if you're using it very hands-off and not like a junior assistant who's extremely fast but doesn't know what he's doing at the architecture and strategy level. But even for that I'm relatively confident the Chinese will be competitive pretty soon, and they won't be too expensive. And we know this because we can see their current models and we know what it takes to run them.

Currently my Strix Halo computer that costed me under £3k can do a lot of LLM stuff that is perfectly useful. In some ways, it's better than "cloud" models, I have models that essentially don't say "no" and I have relatively predictable setups. If you want to get fancy, you can right now rent compute to run models that are extremely capable like the latest ones from Kimi, GLM, Qwen, Minimax at full size from providers that are not operating at a loss and it won't be too expensive. You can pool resources to do the same locally. You can do stuff that cloud providers are unlikely to market, like distillation and abliteration to serve your specific needs.

I'm very optimistic about open weights models just the way they are right now.

But I agree with you that OpenAI will likely play similar games to Anthropic and it could be soon.

New comment by muyuu in "Pro Max 5x quota exhausted in 1.5 hours despite moderate usage"

muyuu — Sun, 12 Apr 2026 15:33:08 +0000

it may also be local/timezone effects

it has been reported that it behaves very differently depending on those factors, presumably because people are placed in best-effort buckets, who knows

New comment by muyuu in "Small models also found the vulnerabilities that Mythos found"

muyuu — Sun, 12 Apr 2026 06:37:16 +0000

OpenAI pulled the same trick with GPT3. It's amazing how well it's working judging by the comments I'm hearing from people I know exist. Because out there on social media, who knows.

New comment by muyuu in "Small models also found the vulnerabilities that Mythos found"

muyuu — Sun, 12 Apr 2026 02:14:38 +0000

I think the "Mythos" name is genius. The people at Anthropic make a bunch of claims and the public is expected to just believe them without any possibility of testing those claims or reproducing those results, and since so many people are invested in this saviour for the Global economy, or in the industry in general, or in hype to feed their engagement-based income sources, then there is faith to spare.

Meanwhile this mythical beast wasn't able to prevent the Bun vulnerability that exposed their code, let alone precluding the need to acquire that IP in the first place for presumably hundreds of millions of $$$, instead of coding a better replacement or a solution of its own.

What is real and measurable is that subscription plan users are getting a much degraded service for the same money through both open and hidden policies, while Anthropic moves compute to serve off-the-counter customers. The same people who come with the most obvious and brazen lies to dismiss the clear degradation of their service also come with this "security" justification for a move that looks just like good old market segmentation which would perfectly fit the strong symptoms that they cannot afford to offer tokens at a competitive price in this market.

New comment by muyuu in "Reallocating $100/Month Claude Code Spend to Zed and OpenRouter"

muyuu — Thu, 09 Apr 2026 23:03:52 +0000

Have you tried Kilo? I'd like to hear from someone who has tried both to know how do they compare.

New comment by muyuu in "System Card: Claude Mythos Preview [pdf]"

muyuu — Wed, 08 Apr 2026 00:49:06 +0000

Yep, I'm skeptical about their inference efficiency, given how much they're scrambling to reduce compute when they're already the most expensive by far (and in my experience not the best quality either).

However we cannot observe these things directly and it could be simply that OpenAI are willing to burn cash harder for now.

New comment by muyuu in "System Card: Claude Mythos Preview [pdf]"

muyuu — Wed, 08 Apr 2026 00:37:11 +0000

When did you last compare them? Codex right now is considerably better in my experience. Can't speak for Gemini.

New comment by muyuu in "Claude Code is locking people out for hours"

muyuu — Tue, 07 Apr 2026 23:40:59 +0000

VC money magic

New comment by muyuu in "Claude Code is locking people out for hours"

muyuu — Tue, 07 Apr 2026 18:01:21 +0000

no grand strategy behind that

they're looking to IPO in 2028 vs 2030 for OpenAI, who have raised more than double the funds

so they're willing to play fast and loose with the terms and conditions of existing customers trying to make it happen

those pockets must be drying up really fast

New comment by muyuu in "Claude Code is locking people out for hours"

muyuu — Tue, 07 Apr 2026 15:56:03 +0000

that is a separate issue indeed, but their comms make it rather obvious they are scrambling to reduce compute and they're just slashing their service selectively - with openclaw and max users being the first in the chopping block

New comment by muyuu in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

muyuu — Tue, 07 Apr 2026 09:31:45 +0000

I really wonder about this. Is it so bad that they cannot even disclose it? not even an optimistic lie in the ballpark of reality? it's not like they haven't been found cooking the truth repeatedly.

I look at the output of Kimi and the costs of running inference on it that i can replicate, and it isn't that bad, although admittedly i don't have to worry anywhere near as much about scaling it and about having to dedicate large amounts of compute to research and distillation on the back end. It's true that it's perhaps a step behind SotA vs January's Opus or current Codex, depending on what you do. But not by a lot. In fact it's leaps and bounds superior to the current subscription API experience. Together with GLM, Qwen and Minimax they are an amazing backstop just the way they are right now.

With all the layers of obfuscation it's hard to even know roughly how many i/o Opus tokens do Claude subscriptions pay for. They'll give you some flippant arguments like "people were not looking at thinking so we're not showing you anymore" with a straight face. However podcasts still insist Anthropic are "winning the AI war" (??) it really makes me wonder because in no metric I can see them as providing neither best value nor best quality, and let's not get started about consumer experience.

My intuition is that things must be really bad so they're willing to pull the kind of moves they're pulling right now. They're speedrunning people into understanding how important it is to be able to run your own generative AI infrastructure for reliability, thus becoming a very fancy but trustless throwaway solution factory.

I wonder if OpenAI will turn the screws similarly if/when their pockets start to dry up at a certain pace.

New comment by muyuu in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

muyuu — Mon, 06 Apr 2026 22:53:35 +0000

My impression is that Codex is vastly superior, but perhaps it's a matter of specific expertise on technologies used. It's also the case that for C/C++ some Chinese models do well enough that with my supervision I can have them get the work done.

I don't give them large tasks that i wouldn't be able to work on myself, so that's maybe part of it.

New comment by muyuu in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

muyuu — Mon, 06 Apr 2026 22:48:26 +0000

Perhaps that does sort most of the issues? I'm not convinced because some of them look deep and related to opaque pre-injection on their end.

New comment by muyuu in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

muyuu — Mon, 06 Apr 2026 22:16:26 +0000

The coin's PM is spamming you trivial gaslighting corporate slop, most of it barely edited.

New comment by muyuu in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

muyuu — Mon, 06 Apr 2026 22:11:23 +0000

The changes to reduce inference costs are intentional. Last thing you're going to do is have users linger on an older version that spends much more. This is essentially what's going on with layers upon layers of social engineering on top of it.

New comment by muyuu in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

muyuu — Mon, 06 Apr 2026 21:41:40 +0000

If you get consistently nowhere near 50% then surely you know you're not throwing a fair coin? What would complaining to the coin provider achieve? Switch coins.

*typo

New comment by muyuu in "In Japan, the robot isn't coming for your job; it's filling the one nobody wants"

muyuu — Mon, 06 Apr 2026 01:02:07 +0000

currently most industrialised countries are in a demographic decline, sometimes patched by immigration that will burden their economies long term much more than they will help them

China is a deeper decline than Japan too, which will make their geopolitics volatile