<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: tarruda</title><link>https://news.ycombinator.com/user?id=tarruda</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 29 May 2026 16:17:36 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=tarruda" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by tarruda in "Step 3.7 Flash"]]></title><description><![CDATA[
<p>The official Q4_K_S gguf is quite good and has very good 35 tps generation on a M1 mac studio. Should be much faster on recent Macs, especially M5.</p>
]]></description><pubDate>Fri, 29 May 2026 12:55:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48322495</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48322495</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48322495</guid></item><item><title><![CDATA[Step 3.7 Flash]]></title><description><![CDATA[
<p>Article URL: <a href="https://static.stepfun.com/blog/step-3.7-flash/">https://static.stepfun.com/blog/step-3.7-flash/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48322451">https://news.ycombinator.com/item?id=48322451</a></p>
<p>Points: 11</p>
<p># Comments: 5</p>
]]></description><pubDate>Fri, 29 May 2026 12:51:43 +0000</pubDate><link>https://static.stepfun.com/blog/step-3.7-flash/</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48322451</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48322451</guid></item><item><title><![CDATA[New comment by tarruda in "Claude Opus 4.8"]]></title><description><![CDATA[
<p>> One of the most prominent improvements in Opus 4.8 is its honesty.<p>Does that mean it no longer deletes or changes tests to make it pass?</p>
]]></description><pubDate>Thu, 28 May 2026 17:12:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=48312132</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48312132</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48312132</guid></item><item><title><![CDATA[New comment by tarruda in "Deno 2.8"]]></title><description><![CDATA[
<p>> safer bet as a dependency.<p>The recent 1 million line vibe coded PR suggests it is not so reliable as a dependency.</p>
]]></description><pubDate>Fri, 22 May 2026 15:25:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=48237202</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48237202</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48237202</guid></item><item><title><![CDATA[New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>> That's impressive getting a 397B down to <110GB<p>It is higher than 110GB. MacOS allows up to 125G of the RAM to be shared with GPU, so it is certainly less than that!<p>> HF link is broken though!<p>Doesn't seem broken to me, but you should be able to search for tarruda/Qwen3.5-397B-A17B-GGUF on huggingface.</p>
]]></description><pubDate>Thu, 21 May 2026 10:00:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=48220181</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48220181</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48220181</guid></item><item><title><![CDATA[New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>> I'm questioning ROI<p>If by ROI you mean saving more money than using paid APIs, then I don't think it is worth it. All you gain is full sovereignty over your AI usage.</p>
]]></description><pubDate>Wed, 20 May 2026 13:20:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207264</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48207264</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207264</guid></item><item><title><![CDATA[New comment by tarruda in "Gemini 3.5 Flash"]]></title><description><![CDATA[
<p>> 300B models at least fit in a single maxed out Mac Studio or a small stack of DGX Sparks or AMD Strix Halo boxes.<p>I run 2.54 BPW 397B Qwen 3.5 GGUF on a 128G mac studio at 20 tokens/second generation and 200 tokens/second processing. I'm not suggesting it matches the performance of the full BF16 model, but I did run some benchmarks locally and the results were pretty good:<p>- MMLU: 87.96%<p>- GPQA diamond: 86.36%<p>- IfEval: 91.13%<p>- GSM8k: 92.57%<p>So I think we have been at the "frontier capabilities at home" for a few months now.</p>
]]></description><pubDate>Wed, 20 May 2026 13:13:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207146</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48207146</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207146</guid></item><item><title><![CDATA[New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>I only tried a very early version of that when it was just a llama.cpp fork and Qwen was certainly better in my tests.<p>But I was not super impressed with deepseek 4 flash using it from the official API either, so it doesn't seem quantization fault. It is a good model, but nothing out of the ordinary in the few benchmarks I ran on it (with full awareness that benchmarks are biased).</p>
]]></description><pubDate>Wed, 20 May 2026 12:49:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=48206834</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48206834</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48206834</guid></item><item><title><![CDATA[New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>> What’s the price point for getting into that sweet spot?<p>In October/2024 I got my Mac studio M1 ultra with 128G, IIRC it was ~$2500. With recent prices explosion, it has certainly gotten more expensive. <a href="https://frame.work/" rel="nofollow">https://frame.work/</a> is selling 128G strix halo mainboard for $2700, but you have to add storage and case.</p>
]]></description><pubDate>Wed, 20 May 2026 12:41:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48206747</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48206747</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48206747</guid></item><item><title><![CDATA[New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>I have a 128G mac studio and even 397B was a happy surprise to me due to its high quantization resilience.<p>I've created a 2.54BPW quant that fit on my hardware with 128k context, 20 tps tg and 200tps pp, while maintaining high scores on many benchmarks: <a href="https://huggingface.co/tarruda/Qwen3.5-397B-A17B-GGUF/discussions/1#69d142b4f17676f98e53c16a" rel="nofollow">https://huggingface.co/tarruda/Qwen3.5-397B-A17B-GGUF/discus...</a></p>
]]></description><pubDate>Wed, 20 May 2026 12:35:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=48206686</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48206686</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48206686</guid></item><item><title><![CDATA[New comment by tarruda in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>Looking forward to more open weight releases from Qwen, especially 122B and 397B.</p>
]]></description><pubDate>Wed, 20 May 2026 12:24:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=48206573</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48206573</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48206573</guid></item><item><title><![CDATA[New comment by tarruda in "Rewrite Bun in Rust has been merged"]]></title><description><![CDATA[
<p>And as long as Bun doesn't break Claude code, which only uses a subset of it's APIs, this might just pay out.</p>
]]></description><pubDate>Thu, 14 May 2026 10:03:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=48133262</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48133262</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48133262</guid></item><item><title><![CDATA[New comment by tarruda in "Rewrite Bun in Rust has been merged"]]></title><description><![CDATA[
<p>> I started looking at the commits, and it's basically solving the ,,tests not pass'' problem by changing the tests themselves<p>Not sure if these decisions were made by the LLM, but I've always felt that Claude is more prone to doing "shady stuff" like modifying tests than finding correct solutions to problems.<p>GPT/Codex is more honest in this regard.</p>
]]></description><pubDate>Thu, 14 May 2026 09:53:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=48133185</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48133185</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48133185</guid></item><item><title><![CDATA[New comment by tarruda in ".de TLD offline due to DNSSEC?"]]></title><description><![CDATA[
<p>Mailbox.org (also from Germany) seems to be experiencing issues too.</p>
]]></description><pubDate>Tue, 05 May 2026 21:09:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48028618</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48028618</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48028618</guid></item><item><title><![CDATA[New comment by tarruda in "Accelerating Gemma 4: faster inference with multi-token prediction drafters"]]></title><description><![CDATA[
<p>They also published draft models for E4B and E2B. For those, the draft models are only 78m parameters: <a href="https://huggingface.co/google/gemma-4-E4B-it-assistant" rel="nofollow">https://huggingface.co/google/gemma-4-E4B-it-assistant</a></p>
]]></description><pubDate>Tue, 05 May 2026 17:39:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=48025835</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48025835</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48025835</guid></item><item><title><![CDATA[New comment by tarruda in "Accelerating Gemma 4: faster inference with multi-token prediction drafters"]]></title><description><![CDATA[
<p>There is a newer PR which will probably be merged soon: <a href="https://github.com/ggml-org/llama.cpp/pull/22673" rel="nofollow">https://github.com/ggml-org/llama.cpp/pull/22673</a></p>
]]></description><pubDate>Tue, 05 May 2026 17:31:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=48025721</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=48025721</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48025721</guid></item><item><title><![CDATA[Show HN: An in-browser, Unix emulator powered by libghostty-vt]]></title><description><![CDATA[
<p>A toy UNIX emulator I built with Rust, libghostty-vt and quickjs.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47968715">https://news.ycombinator.com/item?id=47968715</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 30 Apr 2026 21:53:43 +0000</pubDate><link>https://tarruda.github.io/</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=47968715</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47968715</guid></item><item><title><![CDATA[New comment by tarruda in "Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code"]]></title><description><![CDATA[
<p>Codex is the best out-of-box experience, especially due to its builtin sandboxing. Only drawback is that its edit tool requires the LLM to output a diff which only GPTs are trained to do correctly.</p>
]]></description><pubDate>Mon, 06 Apr 2026 00:34:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47655510</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=47655510</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47655510</guid></item><item><title><![CDATA[New comment by tarruda in "What changes when you turn a Linux box into a router"]]></title><description><![CDATA[
<p>I currently do something similar.<p>My router is a 16GB n150 mini PC with dual NICs. The actual router OS is within openwrt VM managed by Incus (VM/Container hypervisor) that has both NICs passed through.<p>One of the NICs is connected to another OpenWrt wifi access point, and the other is connected to the ISP modem.<p>The n150 also has a wifi card that I setup as an additional AP I can connect to if something goes wrong with the virtualization setup.<p>Been running this for at least 6 months and has been working pretty well.</p>
]]></description><pubDate>Fri, 03 Apr 2026 22:25:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47633148</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=47633148</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47633148</guid></item><item><title><![CDATA[New comment by tarruda in "StepFun 3.5 Flash is #1 cost-effective model for OpenClaw tasks (300 battles)"]]></title><description><![CDATA[
<p>Benchmarks don't tell the whole story. For one-shot coding tasks, I found Step 3.5 Flash to be stronger even than Qwen 3.5 397B.</p>
]]></description><pubDate>Thu, 02 Apr 2026 11:23:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47612928</link><dc:creator>tarruda</dc:creator><comments>https://news.ycombinator.com/item?id=47612928</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47612928</guid></item></channel></rss>