<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: benob</title><link>https://news.ycombinator.com/user?id=benob</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 13 May 2026 14:37:33 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=benob" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by benob in "Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model"]]></title><description><![CDATA[
<p>Deployed it to a huggingface space: <a href="https://huggingface.co/spaces/benoitfavre/needle-playground" rel="nofollow">https://huggingface.co/spaces/benoitfavre/needle-playground</a><p>You can check the very simple docker file there.</p>
]]></description><pubDate>Tue, 12 May 2026 20:34:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48114151</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=48114151</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48114151</guid></item><item><title><![CDATA[New comment by benob in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"]]></title><description><![CDATA[
<p>Here is llama-bench on the same M4:<p><pre><code>  | model                    |       size |     params | backend    | threads |            test |                  t/s |
  | ------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
  | qwen35 27B Q4_K_M        |  15.65 GiB |    26.90 B | BLAS,MTL   |       4 |           pp512 |         61.31 ± 0.79 |
  | qwen35 27B Q4_K_M        |  15.65 GiB |    26.90 B | BLAS,MTL   |       4 |           tg128 |          5.52 ± 0.08 |
  | qwen35moe 35B.A3B Q3_K_M |  15.45 GiB |    34.66 B | BLAS,MTL   |       4 |           pp512 |        385.54 ± 2.70 |
  | qwen35moe 35B.A3B Q3_K_M |  15.45 GiB |    34.66 B | BLAS,MTL   |       4 |           tg128 |         26.75 ± 0.02 |
</code></pre>
So ~60 for prefill and ~5 for output on 27B and about 5x on 35B-A3B.</p>
]]></description><pubDate>Wed, 22 Apr 2026 19:19:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=47868026</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47868026</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47868026</guid></item><item><title><![CDATA[New comment by benob in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"]]></title><description><![CDATA[
<p>I get ~5 tokens/s on an M4 with 32G of RAM, using:<p><pre><code>  llama-server \
   -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \
   --no-mmproj \
   --fit on \
   -np 1 \
   -c 65536 \
   --cache-ram 4096 -ctxcp 2 \
   --jinja \
   --temp 0.6 \
   --top-p 0.95 \
   --top-k 20 \
   --min-p 0.0 \
   --presence-penalty 0.0 \
   --repeat-penalty 1.0 \
   --reasoning on \
   --chat-template-kwargs '{"preserve_thinking": true}'
</code></pre>
35B-A3B model is at ~25 t/s. For comparison, on an A100 (~RTX 3090 with more memory) they fare respectively at 41 t/s and 97 t/s.<p>I haven't tested the 27B model yet, but 35B-A3B often gets off rails after 15k-20k tokens of context. You can have it to do basic things reliably, but certainly not at the level of "frontier" models.</p>
]]></description><pubDate>Wed, 22 Apr 2026 15:38:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=47865140</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47865140</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47865140</guid></item><item><title><![CDATA[New comment by benob in "Making RAM at Home [video]"]]></title><description><![CDATA[
<p>I miss the comment tagging system: insightful, informative, interesting, funny. It would make sense for hn.</p>
]]></description><pubDate>Wed, 22 Apr 2026 06:50:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47859956</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47859956</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47859956</guid></item><item><title><![CDATA[New comment by benob in "Every plane you see in the sky – you can now follow it from the cockpit in 3D"]]></title><description><![CDATA[
<p>Space station tracking:
<a href="https://flight-viz.com/cockpit.html?lat=40.64&lon=-73.78&alt=413000&hdg=220&spd=28000&cs=ISS" rel="nofollow">https://flight-viz.com/cockpit.html?lat=40.64&lon=-73.78&alt...</a></p>
]]></description><pubDate>Sun, 12 Apr 2026 07:22:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=47736941</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47736941</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47736941</guid></item><item><title><![CDATA[New comment by benob in "Simplest Hash Functions"]]></title><description><![CDATA[
<p>I just realized that a hash function is nothing less than the output of a deterministic random number generator xored with some data</p>
]]></description><pubDate>Sun, 12 Apr 2026 07:09:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=47736889</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47736889</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47736889</guid></item><item><title><![CDATA[New comment by benob in "Exploiting the most prominent AI agent benchmarks"]]></title><description><![CDATA[
<p>No, the failure is the human written prompt</p>
]]></description><pubDate>Sun, 12 Apr 2026 05:22:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=47736369</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47736369</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47736369</guid></item><item><title><![CDATA[New comment by benob in "What if the browser built the UI for you?"]]></title><description><![CDATA[
<p>The author emphasizes accessibility and coherence as a benefit but another interesting one is composability which does not emerge naturally in the world of UI. Create a UI for a pair of websites like a command line for grep and wc. LLMs already provide that but under the natural language interaction primitive. UI could allow for branded experiences, ad delivery and whatnot in ways that natural language doesn't.</p>
]]></description><pubDate>Sun, 05 Apr 2026 06:42:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=47646767</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47646767</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47646767</guid></item><item><title><![CDATA[New comment by benob in "EmDash – a spiritual successor to WordPress that solves plugin security"]]></title><description><![CDATA[
<p>"That allows us to license the open source project under the more permissive MIT license."</p>
]]></description><pubDate>Wed, 01 Apr 2026 16:48:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47603351</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47603351</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47603351</guid></item><item><title><![CDATA[New comment by benob in "Google's 200M-parameter time-series foundation model with 16k context"]]></title><description><![CDATA[
<p>I would say:<p>- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors<p>- memorization: some patterns are recurrent in many domains such as power low<p>- multitask: exploit cross-domain connections such as weather vs electricity</p>
]]></description><pubDate>Tue, 31 Mar 2026 06:03:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47583309</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47583309</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47583309</guid></item><item><title><![CDATA[New comment by benob in "Ollama is now powered by MLX on Apple Silicon in preview"]]></title><description><![CDATA[
<p>Ollama is a user-friendly UI for LLM inference. It is powered by llama.cpp (or a fork of it) which is more power-user oriented and requires command-line wrangling. GGML is the math library behind llama.cpp and GGUF is the associated file format used for storing LLM weights.</p>
]]></description><pubDate>Tue, 31 Mar 2026 05:48:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47583209</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47583209</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47583209</guid></item><item><title><![CDATA[New comment by benob in "TurboQuant: Redefining AI efficiency with extreme compression"]]></title><description><![CDATA[
<p>Maybe they quantized a bit too much the model parameters...</p>
]]></description><pubDate>Wed, 25 Mar 2026 07:09:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47514233</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47514233</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47514233</guid></item><item><title><![CDATA[New comment by benob in "TurboQuant: Redefining AI efficiency with extreme compression"]]></title><description><![CDATA[
<p>This is the worst lay-people explanation of an AI component I have seen in a long time. It doesn't even seem AI generated.</p>
]]></description><pubDate>Wed, 25 Mar 2026 07:02:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47514198</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47514198</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47514198</guid></item><item><title><![CDATA[New comment by benob in "Arm AGI CPU"]]></title><description><![CDATA[
<p>This reminds me of Intel talking about faster web browsing with the new Pentium</p>
]]></description><pubDate>Tue, 24 Mar 2026 19:49:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=47508071</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47508071</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47508071</guid></item><item><title><![CDATA[New comment by benob in "Prompt Injecting Contributing.md"]]></title><description><![CDATA[
<p>The real question is when will you resort to bots for rejecting low-quality PRs, and when will contributing bots generate prompt injections to fool your bots into merging their PRs?</p>
]]></description><pubDate>Thu, 19 Mar 2026 17:54:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47443273</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47443273</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47443273</guid></item><item><title><![CDATA[New comment by benob in "Pretraining Language Models via Neural Cellular Automata"]]></title><description><![CDATA[
<p>Reminds me of "Universal pre-training by iterated random computation" <a href="https://arxiv.org/pdf/2506.20057" rel="nofollow">https://arxiv.org/pdf/2506.20057</a>, with bit less formal approach.<p>I wonder if there is a closed-form solution for those kinds of initialization methods (call them pre-training if you wish). A solution that would allow attention heads to detect a variety of diverse patterns, yet more structured than random init.</p>
]]></description><pubDate>Thu, 19 Mar 2026 14:19:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47439961</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47439961</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47439961</guid></item><item><title><![CDATA[New comment by benob in "Zig – Type Resolution Redesign and Language Changes"]]></title><description><![CDATA[
<p>Time to start zig++</p>
]]></description><pubDate>Wed, 11 Mar 2026 09:17:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47333342</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47333342</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47333342</guid></item><item><title><![CDATA[New comment by benob in "AI and the Ship of Theseus"]]></title><description><![CDATA[
<p>It's funny that real value is now in test suites. Or maybe it's always been...</p>
]]></description><pubDate>Fri, 06 Mar 2026 07:22:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47272020</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47272020</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47272020</guid></item><item><title><![CDATA[New comment by benob in "Relicensing with AI-Assisted Rewrite"]]></title><description><![CDATA[
<p>I don't think this would qualify as clean room (the Library was involved in learning to generate programs as a whole). However, it should be possible to remove the library from the OLMO training data and retrain it from scratch.<p>But what about training without having seen any human written program? Coul a model learn from randomly generated programs?</p>
]]></description><pubDate>Thu, 05 Mar 2026 06:28:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47258248</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47258248</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47258248</guid></item><item><title><![CDATA[New comment by benob in "Relicensing with AI-Assisted Rewrite"]]></title><description><![CDATA[
<p>What about doing that with movies and music?</p>
]]></description><pubDate>Thu, 05 Mar 2026 06:16:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47258166</link><dc:creator>benob</dc:creator><comments>https://news.ycombinator.com/item?id=47258166</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47258166</guid></item></channel></rss>