Hacker News: BoorishBears

New comment by BoorishBears in "Apple Silicon and Virtual Machines: Beating the 2 VM Limit (2023)"

BoorishBears — Sun, 12 Apr 2026 00:54:54 +0000

Making software for other Apple products pretty low on the reasons I use a MBP.

128GB of RAM and an M4 Max makes for a very solid development machine, and the build quality is a nice bonus.

New comment by BoorishBears in "Reallocating $100/Month Claude Code Spend to Zed and OpenRouter"

BoorishBears — Thu, 09 Apr 2026 18:04:58 +0000

I still debate how much productivity I've gained from better AI compared to the loss from switching off WebStorm

But their tab complete situation is abysmal, and Supermaven got macrophaged by Cursor

New comment by BoorishBears in "GLM-5.1: Towards Long-Horizon Tasks"

BoorishBears — Thu, 09 Apr 2026 08:10:19 +0000

Their benchmark is chock-full of things like that: It's deeply flawed and is essentially rating how LLMs perform if you exert yourself trying to hold them entirely the wrong way.

New comment by BoorishBears in "App Store sees 84% surge in new apps as AI coding tools take off"

BoorishBears — Thu, 09 Apr 2026 05:06:37 +0000

Lots of people have tried this and most recently TikTok is trying to become TikTok for games by showing 0 install games in the feed

New comment by BoorishBears in "GLM-5.1: Towards Long-Horizon Tasks"

BoorishBears — Wed, 08 Apr 2026 18:16:58 +0000

This one's even more interesting

https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu...

Who knew Anthropic was this far behind???

New comment by BoorishBears in "GLM-5.1: Towards Long-Horizon Tasks"

BoorishBears — Wed, 08 Apr 2026 04:19:53 +0000

Is there really no rule that discourages 99% of your interactions with HN from being peddling some useless slop benchmark?

New comment by BoorishBears in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

BoorishBears — Tue, 07 Apr 2026 02:53:10 +0000

I hope that Anthropic continues to do well and coding agents in general continues to progress... but I also hope Claude Code implodes dramatically and completely so we can get a ground up rebuild with sound engineering.

Every week it seems like we're getting closer.

Bonus: A high profile case might end people fixating on how long they can go without writing any code. Which makes about as much sense as a mechanic fixating on how long they go between snapped bolts without a torque wrench.

New comment by BoorishBears in "OpenScreen is an open-source alternative to Screen Studio"

BoorishBears — Sun, 05 Apr 2026 04:43:47 +0000

> A $30/month subscription is indeed too much, but I see it as a one time payment for that month when I release something, then I pause the subscription. I need it rarely, very few videos need zooming and motion.

If I think something is worth the money, I typically don't need to actively decide to pause the subscription each time I use it.

New comment by BoorishBears in "Google releases Gemma 4 open models"

BoorishBears — Thu, 02 Apr 2026 18:41:04 +0000

Well specifically a congressperson got it to hallucinate stuff about them then wrote an agry letter

But I checked and it's there... but in the UI web search can't be disabled (presumably to avoid another egg on face situation)

New comment by BoorishBears in "Google releases Gemma 4 open models"

BoorishBears — Thu, 02 Apr 2026 18:36:00 +0000

It does not matter at all, especially when talking about Qwen, who've been caught on some questionable benchmark claims multiple times.

New comment by BoorishBears in "Google releases Gemma 4 open models"

BoorishBears — Thu, 02 Apr 2026 18:32:47 +0000

Becnhmarks are a pox on LLMs.

You can use this model for about 5 seconds and realize its reasoning is in a league well above any Qwen model, but instead people assume benchmarks that are openly getting used for training are still relevant.

New comment by BoorishBears in "The OpenAI graveyard: All the deals and products that haven't happened"

BoorishBears — Thu, 02 Apr 2026 00:16:57 +0000

India in this context is a synecdoche for scaling consumer vs Anthropic's more enterprise-y route, but yes that's pretty much why we didn't get 4.5 with reasoning. Without reasoning, 4.5 had no future.

From Sam Altman himself:

> We had this big GPU crunch. We could go make another giant model. We could go make that, and a lot of people would want to use it, and we would disappoint them. And so we said, let’s make a really smart, really useful model, but also let’s try to optimize for inference cost. And I think we did a great job with that.

4.5 scaled into a unified reasoning model would have been an incredible model. It beat GPT-5 on accuracy and hallucinations without reasoning (!)

It just wouldn't have worked for powering things like ChatGPT Go's rollout and loginless chatgpt.com, so they dropped it.

(And if you want, you could argue it's the compute crunch that didn't let them do both... but Anthropic had to make the same choices at the time and went in the other direction.)

New comment by BoorishBears in "The OpenAI graveyard: All the deals and products that haven't happened"

BoorishBears — Wed, 01 Apr 2026 20:43:57 +0000

Except they already did this: if they had scaled 4.5 with RL, 5 would probably have been the leap we expected

If anything 4.5 being abandoned so they could sell India a $3 a month subscription was the first crack in The Box

New comment by BoorishBears in "OpenAI demand sinks on secondary market as Anthropic runs hot"

BoorishBears — Wed, 01 Apr 2026 20:24:49 +0000

I suspect GPT-5 models are sparser and/or smaller than Opus which is why they can afford to give away so much usage.

New comment by BoorishBears in "EmDash – A spiritual successor to WordPress that solves plugin security"

BoorishBears — Wed, 01 Apr 2026 18:47:29 +0000

Why would you gut the credibility of the project for that tagline then: why not skip mentioning agents?

You even open the article by linking the toy project where you used agents to "recreate Next in a week" and released with critical vulnerabilities.

New comment by BoorishBears in "Claude Code's source code has been leaked via a map file in their NPM registry"

BoorishBears — Wed, 01 Apr 2026 17:14:40 +0000

So am I.

That project you quoted is the one with that as its new description. Soon it'll just be [new thing] that happens to use the stars as social proof... in fact when I look again:

> The fastest repo in history to surpass 100K stars . Better Harness Tools that make real things done. Built in Rust using oh-my-codex.

They started a new project that justifies the same repo and scrapes a little credibility off of Claude Code. The intent is not an actual rewrite but to bolster what will be their own personal project trying to compete with OpenCode and co.

The grifter is already pasting references to WSJ articles about themselves in the Readme

New comment by BoorishBears in "Claude Code's source code has been leaked via a map file in their NPM registry"

BoorishBears — Wed, 01 Apr 2026 00:12:49 +0000

That's not the actual plan.

"I have a popular repo, but the content will likely be removed and I won't have personally gained from the saga: how can I fix the part where I didn't profit?"

"Eureka! I'll remove the content preemptively, then come up with a backstory that justifies reusing the now empty repo for building the umpteenth coding harness! And I can even claim fuzzy ties to Claude Code!"

Hence the new description:

> The fastest repo in history to surpass 50K stars , reaching the milestone in just 2 hours after publication. Better Harness Tools, not merely storing the archive of leaked Claude Code but also make real things done. Now rewriting in Rust.

New comment by BoorishBears in "Claude Code's source code has been leaked via a map file in their NPM registry"

BoorishBears — Tue, 31 Mar 2026 23:42:38 +0000

We can be used to refer to people in general, and we know because Anthropic published a post called "Detecting and preventing distillation attacks" a month ago, while calling out 3 AI labs for large scale distillation

https://www.anthropic.com/news/detecting-and-preventing-dist...

New comment by BoorishBears in "Slop is not necessarily the future"

BoorishBears — Tue, 31 Mar 2026 21:25:11 +0000

I think some people are misunderstanding your point.

Yes, some people left to their own devices would take twice as long to ship a product half as buggy only to find out the team that shipped early has taken a massive lead on distribution and now half the product needs to be reworked to catch up.

And some people left to their own devices will also ship a buggy mess way too early to a massive number of people and end up with zero traction or validation out of it, because the bugs weren't letting users properly experience the core experience.

So we've established no one is entirely right, no one is entirely wrong, it's ying/yang and really both sides should ideally exist in each developer in a dynamic balance that changes based on the situation.

But there's also a 3rd camp that's the intersection of these: You want to make products that are so good or so advanced *, that embracing the craft aspect of coding is inherent to actually achieving the goal.

That's a frontend where the actual product is well outside typical CRUD app forms + dashboard and you start getting into advanced WebGL work, or complex non-standard UI state that most LLMs start to choke on.

Or needing to do things quicker than the "default" (not even naive) approach allows for UX reasons. I ran into this using Needleman-Wunsch to identify UI elements on return visits to a site without an LLM request adding latency: to me that's the "crafty" part of engineering serving an actual user need. It's a completely different experience getting near instant feedback vs the default today of making another LLM request.

And it's this 3rd camp's feedback on LLM development that people in the 1st camp wrongly dismiss as being part the 2nd craft-maxxed group. For some usecases, slop is actually terminal.

Intentionally contrived example, but if you're building a Linear competitor and you vibecode a CRDT setup that works well enough, but has some core decisions that mean it'll never be fast enough to feel instant and frontend tricks are hiding that, but now users are moving faster than the data and creating conflicts with their own actions and...

You backed yourself into a wall that you don't discover until it's too late. It's only hypervigilance and strong taste/opinion at every layer of building that kind of product that works.

LLMs struggle with that kind of work right now and what's worrying is, the biggest flaw (a low floor in terms of output quality) doesn't seem to be improving. Opus 4.6 will still try to dynamically import random statements mid function. GPT 5.3 tried to satisfy a typechecker by writing a BFS across an untyped object instead of just updating the type definitions.

RL seems to be driving the floor lower actually as the failure modes become more and more unpredictable compared to even GPT 3.5 which would not even be "creative enough" to do some of these things. It feels like we need a bigger breakthrough than we've seen in the last 1-2 years to actually get to the point where it can do that "Type 3" work.

* good/advanced to enable product-led growth, not good/advanced for the sake of it

New comment by BoorishBears in "Claude Code runs Git reset –hard origin/main against project repo every 10 mins"

BoorishBears — Mon, 30 Mar 2026 02:01:13 +0000

Here in SF I talk to people all day who see this as a feature, not a bug, and that's the persona Claude Code and Codex are selling to.

It started being proposed as a thought experiment "why should we care about the files if AI is going to do the edits", then as Opus got better and the hype built up, the rhetorical part of that dropped and now there are plenty of people who swear they don't write code at all anymore and don't see why anyone would.

I think we're in a feedback loop caused by the fact you can totally get away with not writing code anymore for some reasonably complex topics. But that doesn't account for the long term maintainability of the result, and it doesn't account for people who think they're not writing code, but are relying heavily on the fact we haven't fully magicked away the actual code. They're watching the agents like a hawk, doing small bits and pieces at a time, hitting stop when it starts thinking about the wrong thing, etc.

My worry is the market taking the wrong lesson out of the trends and prematurely trying to force the agent-first future well before the tools or the people are ready.