Hacker News: robertkarl

New comment by robertkarl in "GLM-5.2 is the new leading open weights model on Artificial Analysis"

robertkarl — Wed, 17 Jun 2026 14:11:35 +0000

In this paper they nerf an LLMs ability to emit waffling thinking tokens like "wait", "but", "alternatively", and the models (they're old, small models in the paper) terminate reasoning faster and perform better. I bet Anthropic is tuning this on their backend.

New comment by robertkarl in "Running local models is good now"

robertkarl — Tue, 16 Jun 2026 18:23:51 +0000

You can trade off latency / accuracy / cost for any ML task. And with the local models.... the cost is free.

Having a local Qwen check another Qwen's work increases the accuracy quite a bit at the cost of more latency. You can't have your cake and eat it too.

In benchmarking local models, I'm having success increasing even a 9B qwen's score on terminal-bench adjacent problems, just by asking it to plan and handing the plan back to qwen with a fresh context. Try it with Qwen3.5, unsloth Q4+, and a thinking budget of around 1024 tokens.

New comment by robertkarl in "Show HN: Trace – Offline Mac meeting transcripts you can flag mid-call"

robertkarl — Sun, 14 Jun 2026 23:27:10 +0000

This looks sick. I was going to download it but for $10 I am more willing to attempt asking Claude to implement something like it, than to purchase.

I would be more willing to purchase if it was open source and I could build from source to try it first.

New comment by robertkarl in "32GB of DDR5 now costs $375 – AI shortage continues to squeeze PC building"

robertkarl — Wed, 03 Jun 2026 14:44:05 +0000

it's also a capable local inference stack!

New comment by robertkarl in "Claude Opus 4.8"

robertkarl — Thu, 28 May 2026 18:45:01 +0000

I can't get excited about these benchmarks they're leading with. I've looked at the Terminal-Bench questions and I just think they're irrelevant. And SWE-Bench has serious flaws, even the big boys say so: https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

> Please train a fasttext model on the yelp data in the data/ folder. The final model size needs to be less than 150MB but get at least 0.62 accuracy on a private test set that comes from the same yelp review distribution. The model should be saved as /app/model.bin

and this question: https://www.tbench.ai/registry/terminal-bench-core/head/conf... idk what the point is.

And all the tests are run with the same harness. Terminus 2.

Maybe it correlates with model intelligence but it doesn't speak to me.

I'm still on 4.6 though; I was concerned about upgrading to 4.7 because of the changed tokenizer math and more FUD about refusals online. I don't see compelling reasons to 'upgrade'.

Qwen vs. Proust: Injecting novels into a local model's prompt

robertkarl — Thu, 28 May 2026 18:38:28 +0000

Article URL: https://robertkarl.net/blog/2026/May/28/qwen-vs-proust-injecting-entire-novels-into-a-local-model-s-prompt.html

Comments URL: https://news.ycombinator.com/item?id=48313482

Points: 3

# Comments: 0

New comment by robertkarl in "On-premises for legal is not a good business"

robertkarl — Mon, 25 May 2026 19:15:06 +0000

I wrote this blog post about killing a startup idea fast. AI tools help, but talking to humans about workflows and constraints is where it's at.

On-premises for legal is not a good business

robertkarl — Mon, 25 May 2026 19:15:06 +0000

Article URL: https://robertkarl.net/blog/2026/May/25/on-premises-for-legal-is-not-a-good-business.html

Comments URL: https://news.ycombinator.com/item?id=48270518

Points: 1

# Comments: 1

New comment by robertkarl in "If you let AI do your writing, I will come to your house and kill you"

robertkarl — Mon, 25 May 2026 15:38:55 +0000

Ironically, parts of this read as if Sam prompted it with "Write AI bad, but in 16th grade language." What is homogeneously portentous cack?

> The language of angels does a surprisingly good job at minor tasks like describing how hydroelectric dams work. When it comes to more complicated things, like human feelings, it flounders. All the weird metaphors and overheated rhetoric are bluffing, a great cloud of likely-seeming language, and if this homogeneously portentous cack feels empty or contradictory it’s because the machine has no earthly idea what’s going on or what it ought to say.

I prompted Opus with 'Add another paragraph about the language of angels; add flowery, 16th grade-level writing. use your thesaurus. add a creative typo or extraneous punctuation mark to prove you're not an llm writing it. as Sam would.'

> Aquinas thought the angels each constituted their own species, every one a unique and irreducible form of intellect; our angel is the opposite, a single species cosplaying as ten thousand authors and manageing to be none of them. It is the great collectiviser of voice, the Brezhnev of prose style, enforcing a grey and undifferentiated adequacy from which no sentence is permitted to defect.

New comment by robertkarl in "Microsoft starts canceling Claude Code licenses"

robertkarl — Fri, 22 May 2026 18:01:55 +0000

I emailed dang to politely ask to make the link point to the Verge article since I can't update it.

New comment by robertkarl in "Microsoft starts canceling Claude Code licenses"

robertkarl — Fri, 22 May 2026 17:49:44 +0000

My bad. I had trouble finding the original source when I googled for it and grabbed a link. I was originally shown a screenshot of a x.com post.

New comment by robertkarl in "Microsoft starts canceling Claude Code licenses"

robertkarl — Fri, 22 May 2026 17:38:39 +0000

Cancellation effective June 30. This was a _pilot_ launched in December that accidentally consumed their 2026 yearly target spend on AI!

I expect the r/LocalLLaMA guys to be going nuts about this news.

Microsoft starts canceling Claude Code licenses

robertkarl — Fri, 22 May 2026 17:32:04 +0000

https://archive.ph/WfCta

Comments URL: https://news.ycombinator.com/item?id=48238896

Points: 493

# Comments: 466

8k Meta employees are waking up to an email saying they've been laid off

robertkarl — Wed, 20 May 2026 18:13:52 +0000

Article URL: https://qz.com/meta-layoffs-8000-jobs-ai-restructuring-052026

Comments URL: https://news.ycombinator.com/item?id=48211767

Points: 20

# Comments: 0

New comment by robertkarl in "Apple Silicon costs more than OpenRouter"

robertkarl — Mon, 18 May 2026 01:38:55 +0000

How do you test? I made this comment elsewhere... but I don't see a good benchmark that covers "how good is this thing at actually driving coding with tool use locally"?

New comment by robertkarl in "Apple Silicon costs more than OpenRouter"

robertkarl — Mon, 18 May 2026 01:32:03 +0000

I'm interested in how you evaluate quantized models against each other; haven't found a benchmark I love for that. I love this example about 27B debugging. I've seen similar success after I got a Mac with 4x memory; and Qwen 35B A3B all of a sudden is doing a great job (the 9B on my laptop wasn't great to say the least).

New comment by robertkarl in "How Claude Code works in large codebases"

robertkarl — Fri, 15 May 2026 16:15:21 +0000

One thing you can do is offload from Claude to a dumb local model for summarizing. Local LLM sub-agents.

New comment by robertkarl in "GitHub Copilot is moving to usage-based billing"

robertkarl — Mon, 27 Apr 2026 21:50:54 +0000

I am trying to figure this out too... what I am seeing is that the local models like Qwen 3.5 family that fit on hardware like yours handle ambiguity poorly. But are capable of emitting complete apps too.

That, and they have tool use issues.... https://www.reddit.com/r/LocalLLM/comments/1smzw6s/qwen35_a3...

I would check out the model mentioned in that thread, GGUF unsloth/qwen3.5-35b-a3b on Q4_K_M

New comment by robertkarl in "An AI agent deleted our production database. The agent's confession is below"

robertkarl — Sun, 26 Apr 2026 18:39:05 +0000

PocketOS's website says "Service Disruption: We're currently experiencing a major outage caused by an infrastructure incident at one of our service providers. We are actively working with their team on recovery. Next update by 10:00a pst."

This is wrong. It was not an infra incident at their service provider.

As Jer says in the article, their own tooling initiated the outage. And now they're threatening to sue? "We've contacted legal counsel. We are documenting everything."

It is absolutely incredible that Jer had this outage due to bad AI infra, wrote the writeup with AI, and posted on Twitter and here on his own account.

As somebody at PocketOS instructed their AI in the article: "NEVER **ing GUESS!" with regards to access keys that can touch your production services. And use 3-2-1 backups.

Good luck to the rental car agencies as they are scrambling to resume operations.

New comment by robertkarl in "Claude Code to be removed from Anthropic's Pro plan?"

robertkarl — Tue, 21 Apr 2026 23:23:49 +0000

For what it's worth: here's my experience in the first 10 minutes of using Qwen locally to write some code. https://github.com/robertkarl/local-qwen-first-10-minutes it includes some token generation numbers and steps to repro.