Hacker News: tarruda

New comment by tarruda in "Step 3.7 Flash"

tarruda — Fri, 29 May 2026 12:55:45 +0000

The official Q4_K_S gguf is quite good and has very good 35 tps generation on a M1 mac studio. Should be much faster on recent Macs, especially M5.

Step 3.7 Flash

tarruda — Fri, 29 May 2026 12:51:43 +0000

Article URL: https://static.stepfun.com/blog/step-3.7-flash/

Comments URL: https://news.ycombinator.com/item?id=48322451

Points: 11

# Comments: 5

New comment by tarruda in "Claude Opus 4.8"

tarruda — Thu, 28 May 2026 17:12:48 +0000

> One of the most prominent improvements in Opus 4.8 is its honesty.

Does that mean it no longer deletes or changes tests to make it pass?

New comment by tarruda in "Deno 2.8"

tarruda — Fri, 22 May 2026 15:25:11 +0000

> safer bet as a dependency.

The recent 1 million line vibe coded PR suggests it is not so reliable as a dependency.

New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"

tarruda — Thu, 21 May 2026 10:00:33 +0000

> That's impressive getting a 397B down to <110GB

It is higher than 110GB. MacOS allows up to 125G of the RAM to be shared with GPU, so it is certainly less than that!

> HF link is broken though!

Doesn't seem broken to me, but you should be able to search for tarruda/Qwen3.5-397B-A17B-GGUF on huggingface.

New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"

tarruda — Wed, 20 May 2026 13:20:23 +0000

> I'm questioning ROI

If by ROI you mean saving more money than using paid APIs, then I don't think it is worth it. All you gain is full sovereignty over your AI usage.

New comment by tarruda in "Gemini 3.5 Flash"

tarruda — Wed, 20 May 2026 13:13:23 +0000

> 300B models at least fit in a single maxed out Mac Studio or a small stack of DGX Sparks or AMD Strix Halo boxes.

I run 2.54 BPW 397B Qwen 3.5 GGUF on a 128G mac studio at 20 tokens/second generation and 200 tokens/second processing. I'm not suggesting it matches the performance of the full BF16 model, but I did run some benchmarks locally and the results were pretty good:

- MMLU: 87.96%

- GPQA diamond: 86.36%

- IfEval: 91.13%

- GSM8k: 92.57%

So I think we have been at the "frontier capabilities at home" for a few months now.

New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"

tarruda — Wed, 20 May 2026 12:49:28 +0000

I only tried a very early version of that when it was just a llama.cpp fork and Qwen was certainly better in my tests.

But I was not super impressed with deepseek 4 flash using it from the official API either, so it doesn't seem quantization fault. It is a good model, but nothing out of the ordinary in the few benchmarks I ran on it (with full awareness that benchmarks are biased).

New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"

tarruda — Wed, 20 May 2026 12:41:05 +0000

> What’s the price point for getting into that sweet spot?

In October/2024 I got my Mac studio M1 ultra with 128G, IIRC it was ~$2500. With recent prices explosion, it has certainly gotten more expensive. https://frame.work/ is selling 128G strix halo mainboard for $2700, but you have to add storage and case.

New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"

tarruda — Wed, 20 May 2026 12:35:14 +0000

I have a 128G mac studio and even 397B was a happy surprise to me due to its high quantization resilience.

I've created a 2.54BPW quant that fit on my hardware with 128k context, 20 tps tg and 200tps pp, while maintaining high scores on many benchmarks: https://huggingface.co/tarruda/Qwen3.5-397B-A17B-GGUF/discus...

New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"

tarruda — Wed, 20 May 2026 12:24:55 +0000

Looking forward to more open weight releases from Qwen, especially 122B and 397B.

New comment by tarruda in "Rewrite Bun in Rust has been merged"

tarruda — Thu, 14 May 2026 10:03:12 +0000

And as long as Bun doesn't break Claude code, which only uses a subset of it's APIs, this might just pay out.

New comment by tarruda in "Rewrite Bun in Rust has been merged"

tarruda — Thu, 14 May 2026 09:53:16 +0000

> I started looking at the commits, and it's basically solving the ,,tests not pass'' problem by changing the tests themselves

Not sure if these decisions were made by the LLM, but I've always felt that Claude is more prone to doing "shady stuff" like modifying tests than finding correct solutions to problems.

GPT/Codex is more honest in this regard.

New comment by tarruda in ".de TLD offline due to DNSSEC?"

tarruda — Tue, 05 May 2026 21:09:56 +0000

Mailbox.org (also from Germany) seems to be experiencing issues too.

New comment by tarruda in "Accelerating Gemma 4: faster inference with multi-token prediction drafters"

tarruda — Tue, 05 May 2026 17:39:00 +0000

They also published draft models for E4B and E2B. For those, the draft models are only 78m parameters: https://huggingface.co/google/gemma-4-E4B-it-assistant

New comment by tarruda in "Accelerating Gemma 4: faster inference with multi-token prediction drafters"

tarruda — Tue, 05 May 2026 17:31:03 +0000

There is a newer PR which will probably be merged soon: https://github.com/ggml-org/llama.cpp/pull/22673

Show HN: An in-browser, Unix emulator powered by libghostty-vt

tarruda — Thu, 30 Apr 2026 21:53:43 +0000

A toy UNIX emulator I built with Rust, libghostty-vt and quickjs.

Comments URL: https://news.ycombinator.com/item?id=47968715

Points: 2

# Comments: 0

New comment by tarruda in "Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code"

tarruda — Mon, 06 Apr 2026 00:34:51 +0000

Codex is the best out-of-box experience, especially due to its builtin sandboxing. Only drawback is that its edit tool requires the LLM to output a diff which only GPTs are trained to do correctly.

New comment by tarruda in "What changes when you turn a Linux box into a router"

tarruda — Fri, 03 Apr 2026 22:25:15 +0000

I currently do something similar.

My router is a 16GB n150 mini PC with dual NICs. The actual router OS is within openwrt VM managed by Incus (VM/Container hypervisor) that has both NICs passed through.

One of the NICs is connected to another OpenWrt wifi access point, and the other is connected to the ISP modem.

The n150 also has a wifi card that I setup as an additional AP I can connect to if something goes wrong with the virtualization setup.

Been running this for at least 6 months and has been working pretty well.

New comment by tarruda in "StepFun 3.5 Flash is #1 cost-effective model for OpenClaw tasks (300 battles)"

tarruda — Thu, 02 Apr 2026 11:23:28 +0000

Benchmarks don't tell the whole story. For one-shot coding tasks, I found Step 3.5 Flash to be stronger even than Qwen 3.5 397B.