Hacker News: onlyrealcuzzo

New comment by onlyrealcuzzo in "Microsoft starts canceling Claude Code licenses"

onlyrealcuzzo — Sat, 23 May 2026 15:45:32 +0000

#1 -> part of scaling is you can't review every single line of code.

LLMs don't really scale if you're still the bottlneck, or they only scale as much as you reviewing every line of code - that's not that much scaling...

So I try to only review certain parts, like making sure they aren't changing tests to allow architecturally broken code to slip through (because they regularly try, even when given explicit instructions not to). Or if I'm watching them make changes on my phone and see that they are clearly doing the exact opposite of what they're supposed to be doing (regularly if I'm watching).

#2 -> if commits are small, GitHub's setup is good enough that you can review code on your phone.

#3 -> if they're huge, I can just review on my laptop at lunch or something.

Theoretically, all of this can be solved easily with orchestration and require minimal oversight.

If you're using LLMs to write code and you're carefully reviewing every line with a jade-handled magnifying glass, you're not really scaling - at least to the degree I'm interested in.

New comment by onlyrealcuzzo in "Microsoft starts canceling Claude Code licenses"

onlyrealcuzzo — Sat, 23 May 2026 14:56:20 +0000

Yes, you can do that with Claude Code.

Tell it what to do.

Commit, push to origin, review on GitHub.

Tell it to make changes, amend the commit, push --force-with-lease.

I'm attempting to make a memory safe language like Rust but with a substantially lower learning curve and added safety (but non-zero cost abstractions) fully with AI, almost entirely from my phone, commuting, getting coffee, walking the dog, between sets at the gym, replacing doom scrolling before bed and during lunch, etc.

Mostly to test how much LLMs can actually scale development.

Depending on how long it takes them to clean up some architectural slop in the MIR lowering phase, the results could either be very impressive or not.

From a purely cost basis perspective, it's hard to argue they aren't killing it.

But from a multiplier perspective, it's up in the air how great they are.

It's proven to be a really nice experiment, because much of what I wanted to solve with a language is the problems inherent to LLM development.

So at the self hosting phase, I get a great opportunity to see if the language can actually deliver on what I dream for.

New comment by onlyrealcuzzo in "Microsoft starts canceling Claude Code licenses"

onlyrealcuzzo — Sat, 23 May 2026 12:27:27 +0000

I can tell you for a fact, Claude 4.7 was NOT doing what I told it to do (in fact the clear and complete opposite - repeatedly), a pretty simple architectural refactor, and that Codex did better and DeepSeek much better.

It was given very simple ways to verify success. It simply didn't do that and said it's at a good stopping point, despite moving in the WRONG direction not even doing 1% of the task, and being told to see the task through to completion.

Meanwhile, Codex broke it down into 3 steps and just got it done...

No, "I'm going to give it to you straight, this is a large risky commit that could go sideways, so I'm just not going to do anything instead."

Claude worked on it for almost 200 commits over 2 weeks, needing to typically prompt it 3x to even TRY to make any progress instead of just wasting tokens to ignore me and tell me how big and risky it is.

Maybe Claude is just particularly terrible at this type of refactor. I'm not sure why that would be.

New comment by onlyrealcuzzo in "Microsoft starts canceling Claude Code licenses"

onlyrealcuzzo — Fri, 22 May 2026 21:01:02 +0000

MSFT and Apple are taking the same approach.

The frontier model space costs 1000x as much to develop as the small language models, and is only 1.5 years ahead.

Factually, the frontier models have not paid for themselves. So, if you're MSFT and Apple, you don't need to run in a race where even the winner loses massively.

You can try to train models 1.5 years behind that are highly likely to be profitable, given your market position.

The average person is lagging behind what AI is capable of by 3+ years anyway...

So you can save 1000x on training and 10x on inference and just use SOTA small models.

Why spend $5B training a model that's for sure not going to make $5B (after inference costs) when you can spend $5M building one that WILL make far more than that after inference costs?

New comment by onlyrealcuzzo in "Microsoft starts canceling Claude Code licenses"

onlyrealcuzzo — Fri, 22 May 2026 20:45:57 +0000

I rage canceled Claude today.

After 2 weeks of Claude getting progressively worse and worse, today was the final straw.

I don't care if they have a phone app. The model is COMPLETE garbage after you subscribe long enough and they think they've "got you".

I can't code on my phone if the model literally moves in the wrong direction and does the opposite of what I tell it to. If I wanted to make my code worse, I'd just randomly commit garbage. I don't need a mobile app for that.

New comment by onlyrealcuzzo in "DeepSeek makes the V4 Pro price discount permanent"

onlyrealcuzzo — Fri, 22 May 2026 20:29:55 +0000

The good thing is, we're only 2.5 years away from a top of the line MacBook having better local inference than CC Opus does today.

That's more than good enough if you're actually getting what CC Opus is capable of.

I've never been so excited for the future.

New comment by onlyrealcuzzo in "DeepSeek makes the V4 Pro price discount permanent"

onlyrealcuzzo — Fri, 22 May 2026 20:01:21 +0000

Claude Code has been so unbelievably terrible this entire week that I CANNOT believe it's the same model I was using weeks ago.

I am completely convinced they just screw over their customers after so much usage or so long of a subscription thinking they have them for life.

I have NEVER been so happy to cancel a subscription.

New comment by onlyrealcuzzo in "DeepSeek makes the V4 Pro price discount permanent"

onlyrealcuzzo — Fri, 22 May 2026 19:45:38 +0000

And they don't make the model worse once you have a subscription!

It doesn't matter how good Opus is if 2 months into your subscription they make it worse than GPT 3 to save money.

New comment by onlyrealcuzzo in "AI has a multiplying effect on existing technical skills"

onlyrealcuzzo — Fri, 22 May 2026 19:14:03 +0000

I'm trying to test if vibe coding can actually scale... And man is it painful.

AI is great at creating slop that almost works.

But, my god, it is terrible at following clear as day instructions on how to cleanup slop.

It wrote 150k lines of code that almost works in 2 months. It's taken 1 month to delete about 2000 lines of broken architecture and fix it, and it still hasn't gotten it done, despite nonstop repeated efforts to do something not that hard.

I definitely could've fixed it less time then I've spent prompting at this point (but no way I'd have gotten the other 150k lines). But doing it myself is not the point. It's to see if it can actually scale.

The answer is yes... But my god is it agonizing.

The creating garbage part that almost works is fun.

The inevitable cleanup is not.

And unfortunately I don't see this aspect materially improving in the short term.

If you want it to code you something about 5-10k lines of code that's already been done 1000 times before or only slightly different, it's great.

Most people want more than that.

New comment by onlyrealcuzzo in "DeepSeek makes the V4 Pro price discount permanent"

onlyrealcuzzo — Fri, 22 May 2026 19:01:31 +0000

I just canceled Claude Code and Codex today.

RIP.

Claude literally refuses to finish tasks in auto mode and just keeps saying, now is a good stopping point, when it's 1% done (and doing the EXACT OPPOSITE of what I tell it).

Codex is barely better...

May as well pay 1/20th the price for DeepSeek.

Claude seems to have something that looks at how long you've been a customer and then just massively degrades quality.

When I started my subscription, Claude had none of these problems.

2 months into subscriptions Claude is completely unusable garbage, and Codex is not much better.

New comment by onlyrealcuzzo in "The memory shortage is causing a repricing of consumer electronics"

onlyrealcuzzo — Fri, 22 May 2026 14:35:58 +0000

You don't seem to understand supply and demand.

That does not help the demand side for AI data centers - which is the vast majority of the market...

New comment by onlyrealcuzzo in "The memory shortage is causing a repricing of consumer electronics"

onlyrealcuzzo — Fri, 22 May 2026 13:52:25 +0000

Not really...

If the new memory is needed for AI data centers, it doesn't matter if your existing MacBook doesn't need as much memory anymore.

New comment by onlyrealcuzzo in "Waymo pauses Atlanta service as its robotaxis keep driving into floods"

onlyrealcuzzo — Fri, 22 May 2026 11:48:06 +0000

Well fortunately the rest of the planet is a lot more similar to Arizona than Venus or the moon of the bottom of the Ocean, and they're already doing quite well in like 25 other markets, so...

New comment by onlyrealcuzzo in "The memory shortage is causing a repricing of consumer electronics"

onlyrealcuzzo — Fri, 22 May 2026 11:33:34 +0000

AI data centers are eating like 80% of memory.

Making user space applications more memory efficient is not even going to be a rounding error on memory demand.

I am with you that it needs to happen, but it's not going to solve a memory shortage.

New comment by onlyrealcuzzo in "Waymo pauses Atlanta service as its robotaxis keep driving into floods"

onlyrealcuzzo — Thu, 21 May 2026 22:42:05 +0000

They were only in Arizona for a long time...

New comment by onlyrealcuzzo in "Was my $48K GPU server worth it?"

onlyrealcuzzo — Thu, 21 May 2026 22:40:09 +0000

> I figured worst case scenario I can sell them in the next year and only take a haircut as opposed to losing my entire investment.

It's going to be a non-trivial haircut. This stuff depreciates pretty fast.

New comment by onlyrealcuzzo in "Google's Antigravity Bait and Switch"

onlyrealcuzzo — Thu, 21 May 2026 16:30:03 +0000

> Underlying all of it is a "planning canvas" which is a network graph visualization of the codebase symbols, structures, and relations, where each node of the graph is a custom data-structure that captures a set of considerations.

Cool, I'm thinking along the same lines.

> but thus far they havent done it and i want this to exist, so i persist.

Cool, we are in the same boat [=

> If you're open to it, signup

I'll check it out.

New comment by onlyrealcuzzo in "Shunning AI is the human choice"

onlyrealcuzzo — Thu, 21 May 2026 16:25:46 +0000

Lol, seriously? Or are you trolling?

New comment by onlyrealcuzzo in "Shunning AI is the human choice"

onlyrealcuzzo — Thu, 21 May 2026 15:46:32 +0000

Did people hate the computer this much when it became a thing?

New comment by onlyrealcuzzo in "Google's Antigravity bait and switch"

onlyrealcuzzo — Thu, 21 May 2026 15:36:47 +0000

> Gemma 4 31b is better for coding than Gemini

Is there a fine-tuned Gemma coding model? I'd assume that would perform quite well.