Hacker News: sosodev

New comment by sosodev in "GLM-5.2 is the new leading open weights model on Artificial Analysis"

sosodev — Wed, 17 Jun 2026 18:16:27 +0000

Note that AA's coding index is only made up of two benchmarks: Terminal-Bench Hard and SciCode. I'm skeptical that it makes a good coding index. It ranks Gemma 4 31B above Deepseek V4 Flash. Having used both of those models for a broad variety of coding tasks I would choose Deepseek every day.

New comment by sosodev in "Running local models is good now"

sosodev — Wed, 17 Jun 2026 03:23:59 +0000

Petsitter's default tricks doesn't seem to do much for Qwen3.6, right? JSON mode could be useful I suppose, but that's not really going to make it better at writing code. Do you have any other example tricks? I'm having a hard time understanding how I would apply them.

New comment by sosodev in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"

sosodev — Wed, 17 Jun 2026 03:16:23 +0000

Meh. My server can run these models for neglible power draw (like ~130W fully maxed out). That's with ~30 tok/s which isn't that bad. I do agree that they're still nowhere near as good as the frontier models though. I do lean on those when I need to get something done with better quality or at a faster speed.

I've also been using Deepseek V4 pro/flash for some work stuff and I do find them to be much closer to frontier capability. I may try running flash at home soon for very patient edits. :)

New comment by sosodev in "My Homelab AI Dev Platform"

sosodev — Wed, 17 Jun 2026 03:13:10 +0000

Any artifacts or blogs I can check out? I'm curious how you manage to make them all useful in parallel. I have a hard enough time getting one instance of Qwen3.6-27B being useful full time haha.

New comment by sosodev in "Running local models is good now"

sosodev — Tue, 16 Jun 2026 15:37:02 +0000

I think this is overselling their capabilities. I've used Gemma 4 and Qwen 3.6 quite a bit on my strix halo home server. They're great models and the dense variants are significantly better, but they're still very far behind the frontier. If you boot up Gemma 4 MoE and OpenCode/Pi and expect to perform anything like Claude Code or Codex you're going to be very disappointed.

New comment by sosodev in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"

sosodev — Mon, 15 Jun 2026 17:34:02 +0000

My strix halo board is feeling more useful and less toylike with the recent performance gains combined from MTP, better quantization, and generalized performance improvements across the stack. For example, I can run Unsloth's Gemma4-31B 4-bit QAT model with around 30tg and 200pp. I don't find that to be too slow at all. Particularly because it's nearly full accuracy and good enough for a lot of different stuff I throw at it.

I think it also helps that I'm using my machine to do home server stuff. It excels at all of the traditional workloads. Then I can lean on the AI to help with automation here and there. I find it deeply satisfying.

New comment by sosodev in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"

sosodev — Mon, 15 Jun 2026 17:22:23 +0000

The problem with this question is that it encompasses a huge spectrum of capabilities and expectations. If you can only run an 8B model and expect it to be good at vibe coding / one shotting things you're going to have a bad time.

If you're able to run a model on the scale of ~30B, you can find that with a reasonably scoped and well defined task they do very well. I've found both Gemma4-31B and Qwen3.6-27B to be the best in this range at the moment. You can swap in the MoE models for faster inference, but they are noticeably worse at most tasks. They can one-shot / vibe code tasks with small scope, but still do much better with guidance.

If you really want frontier-like capabilities, you'll probably need at least 128GB of memory and either huge compute or a lot of patience. Most people just don't have either the money or the patience to make these local models work.

The patience required for local model usage goes far beyond just waiting for tokens though. It takes a lot of effort to get things configured and working properly for your workflow and hardware.

New comment by sosodev in "My Homelab AI Dev Platform"

sosodev — Mon, 15 Jun 2026 16:49:00 +0000

I think it heavily depends on what you're asking the model to do. Qwen3.6, both 27B and 35B-A3B, do agentic tool use very well. Their decision making is sus, but the dense model is decent in that way. A 4-bit quant for either of those can run on many home systems with a bit of configuration.

The biggest issue I've noticed is that the chat templates for open models are really hit or miss. The default Qwen3.6 chat template mostly works these days, but depending on your workload it may cause major issues. There are plenty of "fixed" chat templates on hugging face, but people report mixed success. It really seems to depend a lot on what the tool you're using expects.

New comment by sosodev in "/architect: Reduce Fable tokens by 80%, Fable orchestrates/reviews, Codex builds"

sosodev — Fri, 12 Jun 2026 23:45:18 +0000

I don’t know why you’re getting downvoted. It’s true. Averaged across a wide variety of benchmarks Fable is the only Anthropic model that performs better than GPT 5.5 xhigh.

New comment by sosodev in "Claude Fable 5"

sosodev — Tue, 09 Jun 2026 17:55:07 +0000

I wonder if model distillation will continue to work as well as it has. Given hidden reasoning, the ever expanding number of expected capabilities, a serious compute shortage, the looming possibility of model collapse, and dramatically higher API costs I would guess that it's getting much harder to do.

New comment by sosodev in "Claude Fable 5"

sosodev — Tue, 09 Jun 2026 17:48:04 +0000

Do you have any resources to share regarding independent expert training? I was under the impression that it's not feasible.

Software Has Long Been Beyond Our Understanding

sosodev — Fri, 05 Jun 2026 00:29:12 +0000

Article URL: https://kylemcgough.com/blogs/software-has-long-been-beyond-our-understanding

Comments URL: https://news.ycombinator.com/item?id=48406534

Points: 3

# Comments: 1

New comment by sosodev in "The newest Instagram “exploit” is the goofiest I've seen"

sosodev — Mon, 01 Jun 2026 16:47:18 +0000

Support requests have always been the weakest link in the security chain for big corps. I've had accounts of mine turned over with 2FA disabled by humans before. I guess we shouldn't be surprised that the LLMs are doing the same thing.

The simple fact that 2FA can be removed by low level support staff drives me mad. It defeats the whole purpose of the process.

New comment by sosodev in "Nvidia Cosmos 3"

sosodev — Mon, 01 Jun 2026 15:51:42 +0000

Most of the examples they've chosen seem.. not good? What an odd mix of bad game engine and AI slop. I can't imagine that this stuff makes good training data for real-world applications.

New comment by sosodev in "OpenRouter raises $113M Series B"

sosodev — Sat, 30 May 2026 20:31:29 +0000

I’m not sure I understand your question. Every interaction you have with a model in a web page does the same thing in the backend. It feeds the whole conversation history, perhaps with a bit of processing, into the model so it can process the next generation. Filling the context window is how these models retain coherence.

New comment by sosodev in "Citing 'severe' math deficits, UC faculty demand a return to SAT tests for STEM"

sosodev — Thu, 28 May 2026 16:57:49 +0000

Isn’t this contradictory to your point? They dropped it, collected data, and then reverted when the evidence suggested they made the wrong choice.

New comment by sosodev in "No more JetBrains products for me"

sosodev — Mon, 18 May 2026 21:06:48 +0000

I also cut off JetBrains recently after a long relationship with their tools. I agree with the points made by the author. The tools are clunky resource hogs for seemingly no reason. I was really excited when JetBrains announced Fleet and promised a lightweight UI with the old analysis engines as lighter background processes. It seemed like it would solve a lot of the problems I had with their IDEs. That never materialized though. They say that Fleet integrated into Air, but Air is not an IDE. So now we're just left with the diminishing value of their traditional IDE offering and some floundering attempts to get into the AI market. What a shame.

New comment by sosodev in "Figure 03 robot work shift livestream [video]"

sosodev — Fri, 15 May 2026 03:40:46 +0000

Why is nobody on HN talking about this? Unless this is a very advanced fake, it seems like the first proof that humanoid robots are actually capable of real labor.

New comment by sosodev in "The Rise of the Bullshittery"

sosodev — Tue, 12 May 2026 20:46:15 +0000

If Wall Street was so wise they would only reward meaningful layoffs. Laying off 10% of a company by stack ranking every team accomplishes nothing. Particularly if the company just hires the same number of cut people next quarter.

If a tree has a dead branch, you cut it off. Cutting off 10% of the leaves evenly distributed among branches will remove some dead leaves, but it leaves the source of the problems unaddressed.

New comment by sosodev in "US National Debt Surpasses GDP"

sosodev — Thu, 30 Apr 2026 22:56:57 +0000

Well put. I too am optimistic that, in the long term, good will prevail and we'll be stronger because of the suffering. I also agree that there's happiness and meaning to be found in presence and local life. However, it feels quite hard to let it wash over me when I spend so much time at work. The hippie lifestyle is very tempting, but I want stability and a family.