Hacker News: benob

New comment by benob in "Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model"

benob — Tue, 12 May 2026 20:34:51 +0000

Deployed it to a huggingface space: https://huggingface.co/spaces/benoitfavre/needle-playground

You can check the very simple docker file there.

New comment by benob in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"

benob — Wed, 22 Apr 2026 19:19:29 +0000

Here is llama-bench on the same M4:

  | model                    |       size |     params | backend    | threads |            test |                  t/s |
  | ------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
  | qwen35 27B Q4_K_M        |  15.65 GiB |    26.90 B | BLAS,MTL   |       4 |           pp512 |         61.31 ± 0.79 |
  | qwen35 27B Q4_K_M        |  15.65 GiB |    26.90 B | BLAS,MTL   |       4 |           tg128 |          5.52 ± 0.08 |
  | qwen35moe 35B.A3B Q3_K_M |  15.45 GiB |    34.66 B | BLAS,MTL   |       4 |           pp512 |        385.54 ± 2.70 |
  | qwen35moe 35B.A3B Q3_K_M |  15.45 GiB |    34.66 B | BLAS,MTL   |       4 |           tg128 |         26.75 ± 0.02 |

So ~60 for prefill and ~5 for output on 27B and about 5x on 35B-A3B.

New comment by benob in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"

benob — Wed, 22 Apr 2026 15:38:05 +0000

I get ~5 tokens/s on an M4 with 32G of RAM, using:

  llama-server \
   -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \
   --no-mmproj \
   --fit on \
   -np 1 \
   -c 65536 \
   --cache-ram 4096 -ctxcp 2 \
   --jinja \
   --temp 0.6 \
   --top-p 0.95 \
   --top-k 20 \
   --min-p 0.0 \
   --presence-penalty 0.0 \
   --repeat-penalty 1.0 \
   --reasoning on \
   --chat-template-kwargs '{"preserve_thinking": true}'

35B-A3B model is at ~25 t/s. For comparison, on an A100 (~RTX 3090 with more memory) they fare respectively at 41 t/s and 97 t/s.

I haven't tested the 27B model yet, but 35B-A3B often gets off rails after 15k-20k tokens of context. You can have it to do basic things reliably, but certainly not at the level of "frontier" models.

New comment by benob in "Making RAM at Home [video]"

benob — Wed, 22 Apr 2026 06:50:22 +0000

I miss the comment tagging system: insightful, informative, interesting, funny. It would make sense for hn.

New comment by benob in "Every plane you see in the sky – you can now follow it from the cockpit in 3D"

benob — Sun, 12 Apr 2026 07:22:42 +0000

Space station tracking: https://flight-viz.com/cockpit.html?lat=40.64&lon=-73.78&alt...

New comment by benob in "Simplest Hash Functions"

benob — Sun, 12 Apr 2026 07:09:18 +0000

I just realized that a hash function is nothing less than the output of a deterministic random number generator xored with some data

New comment by benob in "Exploiting the most prominent AI agent benchmarks"

benob — Sun, 12 Apr 2026 05:22:50 +0000

No, the failure is the human written prompt

New comment by benob in "What if the browser built the UI for you?"

benob — Sun, 05 Apr 2026 06:42:32 +0000

The author emphasizes accessibility and coherence as a benefit but another interesting one is composability which does not emerge naturally in the world of UI. Create a UI for a pair of websites like a command line for grep and wc. LLMs already provide that but under the natural language interaction primitive. UI could allow for branded experiences, ad delivery and whatnot in ways that natural language doesn't.

New comment by benob in "EmDash – a spiritual successor to WordPress that solves plugin security"

benob — Wed, 01 Apr 2026 16:48:34 +0000

"That allows us to license the open source project under the more permissive MIT license."

New comment by benob in "Google's 200M-parameter time-series foundation model with 16k context"

benob — Tue, 31 Mar 2026 06:03:33 +0000

I would say:

- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors

- memorization: some patterns are recurrent in many domains such as power low

- multitask: exploit cross-domain connections such as weather vs electricity

New comment by benob in "Ollama is now powered by MLX on Apple Silicon in preview"

benob — Tue, 31 Mar 2026 05:48:27 +0000

Ollama is a user-friendly UI for LLM inference. It is powered by llama.cpp (or a fork of it) which is more power-user oriented and requires command-line wrangling. GGML is the math library behind llama.cpp and GGUF is the associated file format used for storing LLM weights.

New comment by benob in "TurboQuant: Redefining AI efficiency with extreme compression"

benob — Wed, 25 Mar 2026 07:09:43 +0000

Maybe they quantized a bit too much the model parameters...

New comment by benob in "TurboQuant: Redefining AI efficiency with extreme compression"

benob — Wed, 25 Mar 2026 07:02:58 +0000

This is the worst lay-people explanation of an AI component I have seen in a long time. It doesn't even seem AI generated.

New comment by benob in "Arm AGI CPU"

benob — Tue, 24 Mar 2026 19:49:36 +0000

This reminds me of Intel talking about faster web browsing with the new Pentium

New comment by benob in "Prompt Injecting Contributing.md"

benob — Thu, 19 Mar 2026 17:54:55 +0000

The real question is when will you resort to bots for rejecting low-quality PRs, and when will contributing bots generate prompt injections to fool your bots into merging their PRs?

New comment by benob in "Pretraining Language Models via Neural Cellular Automata"

benob — Thu, 19 Mar 2026 14:19:35 +0000

Reminds me of "Universal pre-training by iterated random computation" https://arxiv.org/pdf/2506.20057, with bit less formal approach.

I wonder if there is a closed-form solution for those kinds of initialization methods (call them pre-training if you wish). A solution that would allow attention heads to detect a variety of diverse patterns, yet more structured than random init.

New comment by benob in "Zig – Type Resolution Redesign and Language Changes"

benob — Wed, 11 Mar 2026 09:17:27 +0000

Time to start zig++

New comment by benob in "AI and the Ship of Theseus"

benob — Fri, 06 Mar 2026 07:22:07 +0000

It's funny that real value is now in test suites. Or maybe it's always been...

New comment by benob in "Relicensing with AI-Assisted Rewrite"

benob — Thu, 05 Mar 2026 06:28:13 +0000

I don't think this would qualify as clean room (the Library was involved in learning to generate programs as a whole). However, it should be possible to remove the library from the OLMO training data and retrain it from scratch.

But what about training without having seen any human written program? Coul a model learn from randomly generated programs?

New comment by benob in "Relicensing with AI-Assisted Rewrite"

benob — Thu, 05 Mar 2026 06:16:12 +0000

What about doing that with movies and music?