Hacker News: antirez

New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"

antirez — Fri, 08 May 2026 05:39:21 +0000

It's a mix of extreme sparsity but with the routed expert doing a non trivial amount of work (and it is q8), and projections and routing not being quantized as well. Also the fact it's a QAT model must have a role I guess, and I quantized routed experts out layers with Q2 instead of IQ2_XXS to retain quality.

New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"

antirez — Fri, 08 May 2026 05:35:39 +0000

Yep that happens with coding agents sending a very large system prompt. And also when later tool calling feed it large files or diffs. But with the M3 ultra the prefill speed is almost 500 t/s that is quite into the very usable zone. With M3 max you need a bit more patience but it works well and as it emits the think process if you use the pi agent you don't wait: you read non censored chain of though. I posted a video on X yesterday using it with my m3 max. It spills tokens at a decent speed.

New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"

antirez — Thu, 07 May 2026 20:37:54 +0000

It runs both q2 and original (4 bit routed experts). At the same speed more or less. The q2 quants are not what you could expect: it works extremely well for a few reasons. For the full model you need a Mac with 256GB.

New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"

antirez — Thu, 07 May 2026 19:11:17 +0000

DS4 can process 460 prompt tokens per second. Not stellar but not so slow. On M3 max. See the benchmarks on readme.

New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"

antirez — Thu, 07 May 2026 18:06:21 +0000

True quantitatively, not qualitatively. DeepSeek V4 is not capable of doing what a human brain can do, of course, but for the tasks it can do, it can do it at a speed which is completely impossible for a human, so comparing the two requires some normalization for speed.

New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"

antirez — Thu, 07 May 2026 17:52:47 +0000

A random, funny, interesting and telling data point: my MacBook M3 Max while DS4 is generating tokens at full speed peaks 50W of energy usage...

New comment by antirez in "Train Your Own LLM from Scratch"

antirez — Tue, 05 May 2026 18:05:40 +0000

It was not my best (nor normal) behavior, but the point in this case is that the OP offered very little in his rebuttal. A more contextualized reply would have improved mine as well. I believe actually the person that published this LLM course on GitHub works at ElevenLabs, as Google shows. So the reply could be: "Are you sure? I googled and apparently he works for ElevenLabs". That would have triggered a different reply. So I was not polite enough, and I said sorry, but given the exchange to say "google it" was not terrible, was exactly how I thought I had found it (I google for the wrong name, but citing MLX, plus X, and Google returned the wrong result). So it was a metter of "I did this way".

New comment by antirez in "Train Your Own LLM from Scratch"

antirez — Tue, 05 May 2026 14:06:26 +0000

You are right, sorry the name is very similar and I thought it was: https://x.com/angeloskath

New comment by antirez in "Train Your Own LLM from Scratch"

antirez — Tue, 05 May 2026 13:19:31 +0000

Google the name of the author.

New comment by antirez in "Train Your Own LLM from Scratch"

antirez — Tue, 05 May 2026 07:16:30 +0000

Context: he is one of the MLX developers, a skilled ML researcher.

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 18:31:01 +0000

If Pro is the same model (hard to tell, I'm not sure) it has a token budget to think (test time scaling) which is huge compared to the Codex endpoint.

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 18:16:31 +0000

Redis was completely built in this way since the start. I believe this is a better way to create software. Compromise in design is, in my opinion, something to avoid: feedbacks are important, but often times a single person that studied a lot the problem and have design taste, can come up with a great solution. Mediating such solution, even among two stellar A and B solutions, will not produce a C soution that is better, since you can't produce such solution by interpolation. It is simpler to damage A and B. And: it is rare that in a big set of people all have stellar ideas, so you have to mediate, often, also with people having poor ideas. Not worth the effort for the way I'm wired. What works better for me is to provide hints about what I'm doing, then I receive feedbacks, and sometimes there are really great ideas in this feedbacks, and I incorporate the part I like.

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 18:15:45 +0000

1. The huge jump from from Opus to GPT 5.3. Game changer. GPT 5.4, 5.5, were better but only incrementally better.

2. Nope I don't give much personalities, but I use subtle prompt differences to maximize certain responses I want, to make the model focusing in a given detail or acting in a specific kind of engineering mindset.

3. It never happened that the AI was slowing me down since I always had the full context and code detail in mind of what was happening. I believe that this happens more when you don't have a clear idea. Also GPT >= 5.3/4 is not the past generation of models, it is very hard to trap it into a situation where it seems unable to understand what you mean.

4. A few times the AI provided fresh insights that I really liked. Most of the times it was the other way around. Certain implementations were written by the AI at a very impressive level of quality.

5. I don't use general skills, I build skills with deep search when needed for specific projects, and build an AGENT.md that works as a knowledge base as I work with the AI. One thing that I use a lot is, when there is a very complex problem, to tell GPT that I have a friend called Machiavelli that is an incredible computer scientist. To write him an email in /tmp/letter.md with the problem we are facing, and I'll try to get a reply. Then I ask GPT 5.5 Pro on the web with extensive reasoning set on. It will take sometimes 30 minutes or more to reply. Often times after I feed back the reply, the agent will be able to see things a lot more clearly.

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 18:10:23 +0000

The code is 5000 lines of code in total, comments included:

2000 lines the sparse array.

2000 lines the t_array commands and upper layer implementation.

~500 lines of AOF / RDB code.

All the other stuff is tests, JSON command descriptions, TRE library under "deps".

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 18:08:46 +0000

Unfortunately not, sorted sets are actually a bit in the other side of the spectrum: they are semantically sound, but absolutely wasteful because of the combined skiplist + array. Also, if the underlying representation is not an array, range queries and ring buffers will never be as efficient and compact as they should. In theory you can do everything with everything, but segmenting what each API can do allows you to exploit the use cases to provide the best underlying implementation.

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 17:33:33 +0000

Redis sets the locale at startup to avoid issues so should be ok but we will document that for instance è will not match È when nocase is used.

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 17:27:36 +0000

KEYS comes immediately to mind :)

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 17:09:50 +0000

Once I realized arrays were a great fit for text files, many use cases I could conceive were always limited by the fact we need to grep on files. So I thought: what is the AROP equivalent for files? ARGREP. Then I made sure to add both fast, exact and regexp matching so that depending on the use case the best tool could be used. I then discovered that for many OR-ed strings regexps could be the faster way if we'll optimized. And then I specialized TRE a bit.

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 16:35:24 +0000

I should probably remove the Adsense JS which I don't use anyway...

New comment by antirez in "Redis array: short story of a long development process"

antirez — Mon, 04 May 2026 16:17:00 +0000

I appreciate your kind reply as well :)