Hacker News: hedgehog

New comment by hedgehog in "What it feels like to work with Mythos"

hedgehog — Tue, 09 Jun 2026 18:11:42 +0000

Work duration is also not that valuable of a measure, you're usually better off defining the process yourself in code and having that delegate chunks of work to the models. The only real issue there is that it's harder to take advantage of the providers' subscription discounts, but on the other hand it's easier to do your own model routing, and there's no way I've seen for the normal chatbots to maintain coherence on streams of work measured in days and weeks.

New comment by hedgehog in "MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second"

hedgehog — Mon, 08 Jun 2026 23:03:50 +0000

For scale though if three or four chips that size can replicate a Qwen 27B experience that'll be quite useful.

New comment by hedgehog in "Stop the Apple Music app from launching"

hedgehog — Mon, 08 Jun 2026 18:48:54 +0000

iTunes and iPhoto both. Given how good the tools are getting, and how much existing sample code is available, it seems likely someone will do a good job of reincarnating them in the near future. Apple broke the apps I used most on the Mac and then they added the bubblicious design crime UI, no thanks.

New comment by hedgehog in "Full Reverse Engineering of the TI-84 Plus Operating System"

hedgehog — Mon, 08 Jun 2026 18:39:03 +0000

Do you have plans to generate a buildable version of the sources, and do you know the original implementation language (C?).

New comment by hedgehog in "Show HN: I Derived a Pancake"

hedgehog — Mon, 08 Jun 2026 03:38:33 +0000

I want to like this but it reads like Claude output, how much scrutiny did this get for accuracy?

New comment by hedgehog in "The LLM warnings Google fired Timnit Gebru over have all come true"

hedgehog — Thu, 04 Jun 2026 18:17:36 +0000

The scale of the data and the size of the models don't change the underlying issue, the whole construction of these models is to start with a maximum likelihood language sampler (pre-training) and then massage it into a maximum utility language sampler (post-training) with some eye towards risk management and policy compliance ("safety"). It takes work to make model output fit any particular idea of "correct", whether it's Elon's particular ideology, the US Civil Rights act, Xi Jinping Thought, or writing clean C++. More data and weights increase the complexity of tasks that we're able to model but it doesn't automatically make the output "better" on any given axis.

New comment by hedgehog in "Gemma 4 12B: A unified, encoder-free multimodal model"

hedgehog — Thu, 04 Jun 2026 08:17:13 +0000

Ryzen 395 is what I'm using, anything with 128GB+ of RAM accessible to the GPU should work fine for a 4 bit version of the model (so Spark or Mac Studio should be ok too).

New comment by hedgehog in "Gemma 4 12B: A unified, encoder-free multimodal model"

hedgehog — Thu, 04 Jun 2026 03:09:42 +0000

Same chip, with a 6 bit 35B and 8 bit KV cache I see about 500 prefill and 55 decode at 30k into the context window. MiniMax seemed a bit lower token rate but much, much less prone to 40k tokens of monologue before generating an answer. A pattern I like is to use a smaller model to do most execution and then a larger model to review transcripts and output and do any fixups and tooling improvements (this is all batch jobs so all I care about is overall throughput).

New comment by hedgehog in "Gemma 4 12B: A unified, encoder-free multimodal model"

hedgehog — Thu, 04 Jun 2026 01:18:08 +0000

The 6-bit versions + 8-bit KV cache seems to save a good bit of memory without a significant loss of quality. The Qwen 35B is pretty fast in my testing, but MiniMax M2.7 230B is in some ways faster (way fewer tokens to arrive at an answer) even though it is much larger.

New comment by hedgehog in "MAI-Code-1-Flash"

hedgehog — Tue, 02 Jun 2026 21:07:44 +0000

Yes. Divide execution of a change into separate responsibilities. Designate the main chat as the "orchestrator", Opus. You designate a goal, then tell it to grind until it gets there using the following sub-agents in sequence:

1. Step execution (Sonnet): Work for 30 minutes / 100k tokens at the direction of the Orchestrator

2. Review (Opus): Scrutinize the previous step's work for errors, fidelity to the instructions, fix those and record opportunities to improve the agent configuration + tools to reduce errors and token usage (record those to a file).

3. Self-improvement (Opus): Implement the highest impact self-improvement items that don't require user intervention.

Repeat: Until orchestrator session token budget exhausted (set it to 1M or whatever).

The underlying rationale is to keep each step manageable to maximize adherence to instructions and minimize cost (even cached tokens cost something). Prompt tokens are much cheaper than generated, so to the extent Opus mostly reviews rather than drives that saves a lot too. Self-improvement steps are very expensive but the improvements compound, if you're going to run a job for days or weeks it's way more expensive not to do them.

Edit: I do this in Claude Code with the Anthropic models as well as Qwen family models for offline use.

New comment by hedgehog in "SQLite is all you need for durable workflows"

hedgehog — Sat, 30 May 2026 02:34:25 +0000

It does, my experience has been that it adds code complexity, deployment complexity, and performance problems. There are some observability benefits, but other ways to solve that. It's possible there are workloads that fit it but not anything I've personally worked on.

New comment by hedgehog in "Dynamic Workflows in Claude Code"

hedgehog — Thu, 28 May 2026 21:37:37 +0000

How granular is the control over the internal process?

In my experiments I've had some success modeling the work to be done as a DAG of typed artifacts with a combination of code + LLM doing decomposition, transforms, synthesis, and fitness checking to generate the output. It took me a lot of tries to arrive at that formula and it would be cool to have something more general. I also run part of it against local compute because it would be far beyond my budget to do it all on Opus, so something for that would be nice too.

New comment by hedgehog in "Ruby vs. Java vs. TypeScript: my experience on building a Cowork DOCX plugin"

hedgehog — Thu, 28 May 2026 17:06:10 +0000

You know that, and I know that, but for someone who started working more recently the difference between CORBA and punch cards might be a little blurry because they're both so far back they've never seen either. It's like kids asking how the dinosaurs in LEGO Jurassic world were animated, because they don't move like real toys, and noting how much easier the 1993 live action Jurassic Park filming was because back then they could just film real dinosaurs. Feels weird, but makes sense from their perspective.

New comment by hedgehog in "What Is a Direct Attach Copper (DAC) Cable? (2021)"

hedgehog — Thu, 28 May 2026 02:09:09 +0000

I went from one dev machine to two at my desk so I connected them via 25GBe. With about 2.8GBps TCP throughput and RDMA available I don't have to think too much about task placement or cross-traffic. (specific hardware: Mellanox ConnectX 4 LX cards + a DAC cable).

New comment by hedgehog in "Tech CEOs are apparently suffering from AI psychosis"

hedgehog — Wed, 27 May 2026 17:07:41 +0000

Does that mean it can work 20% faster?

New comment by hedgehog in "A portentous reunion"

hedgehog — Wed, 27 May 2026 00:15:57 +0000

I've done some tool-assisted ports (including without original source), the work you already did is probably 1/4 of the way to a web-hosted Rust BattleTris.

New comment by hedgehog in "Norway's 2 petabytes of Huawei flash storage and LLM training"

hedgehog — Tue, 26 May 2026 03:37:55 +0000

That's enough resources to build on something like the Olmo 3 recipe but with a mix prioritizing their own data and post-training for their own tasks. If they build their own embedding model, index everything in the library, and train their model to query that data while answering historical, cultural, legal, and strategic questions from their perspective... Pretty interesting and likely useful. They won't beat Anthropic at dumping out React code but also there's no real reason to duplicate that.

New comment by hedgehog in "CVE-2026-28952: Apple macOS 26.5 Kernel Vuln found by Claude"

hedgehog — Tue, 26 May 2026 00:56:18 +0000

This was fixed in 26.5 as well as 15.7.7 etc.

https://app.opencve.io/cve/CVE-2026-28952

New comment by hedgehog in "Migrating from Go to Rust"

hedgehog — Mon, 25 May 2026 15:40:21 +0000

Pauses are a problem with heap size and structure, not allocation rate, because the pause is caused by GC code that is O(heap size). Making garbage slower reduces the frequency but not severity. This is an issue with most GCs to some degree, there are phases of collection where the GC stops execution and the duration is relative to how much work it has to do which is based on how many objects and how much memory needs to be checked. "Concurrent" garbage collection is the approach of trying to reduce the pauses by doing more of the work while program execution continues. It's complicated and hard to get right, so Go's original GC was IIRC fully stop-the-world.

There are some fine points to the O(heap size), for example it's clearly unnecessary for the GC to scan objects that do not themselves contain pointers, and work is somewhat proportional to the total number of objects. Combining numerous small objects into manually managed slices, coming up with ways to make the most numerous items pointer-free, etc.

I learned a bit about this when an analytics workload I had ended up with unacceptable pauses (I think over 1 second), Go's GC is more sophisticated now but I think in any GC runtime you have the same considerations to some degree. Some of the best writing at the time was by Gil Tene, one of the principal authors of the C4 concurrent collector at Azul Systems, starting point here:

https://groups.google.com/g/golang-dev/c/GvA0DaCI2BU/m/SmEel...

New comment by hedgehog in "Migrating from Go to Rust"

hedgehog — Mon, 25 May 2026 15:10:58 +0000

Possibly in your specific application, usually there are a handful of options far less painful than a rewrite.

For the original issue of GC pauses, a narrow change is to move problem data to non-pointer-carrying types, or the bigger hammer of manually managed slices of those types. The second helps with fragmentation too. Some workloads can be split into multiple processes as a direct way to have smaller heaps. If none of those options are enough then off-heap storage lets you do whatever you want.

I do have some complaints about Go, but one of the big ones has been fixed since I last wrote much Go code and it seems like a fine choice for a lot of applications.