<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: antirez</title><link>https://news.ycombinator.com/user?id=antirez</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 08 May 2026 17:51:14 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=antirez" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"]]></title><description><![CDATA[
<p>It's a mix of extreme sparsity but with the routed expert doing a non trivial amount of work (and it is q8), and projections and routing not being quantized as well. Also the fact it's a QAT model must have a role I guess, and I quantized routed experts out layers with Q2 instead of IQ2_XXS to retain quality.</p>
]]></description><pubDate>Fri, 08 May 2026 05:39:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48059017</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48059017</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48059017</guid></item><item><title><![CDATA[New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"]]></title><description><![CDATA[
<p>Yep that happens with coding agents sending a very large system prompt. And also when later tool calling feed it large files or diffs. But with the M3 ultra the prefill speed is almost 500 t/s that is quite into the very usable zone. With M3 max you need a bit more patience but it works well and as it emits the think process if you use the pi agent you don't wait: you read non censored chain of though. I posted a video on X yesterday using it with my m3 max. It spills tokens at a decent speed.</p>
]]></description><pubDate>Fri, 08 May 2026 05:35:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48058986</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48058986</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48058986</guid></item><item><title><![CDATA[New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"]]></title><description><![CDATA[
<p>It runs both q2 and original (4 bit routed experts). At the same speed more or less. The q2 quants are not what you could expect: it works extremely well for a few reasons. For the full model you need a Mac with 256GB.</p>
]]></description><pubDate>Thu, 07 May 2026 20:37:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=48054621</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48054621</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48054621</guid></item><item><title><![CDATA[New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"]]></title><description><![CDATA[
<p>DS4 can process 460 prompt tokens per second. Not stellar but not so slow. On M3 max. See the benchmarks on readme.</p>
]]></description><pubDate>Thu, 07 May 2026 19:11:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=48053498</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48053498</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48053498</guid></item><item><title><![CDATA[New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"]]></title><description><![CDATA[
<p>True quantitatively, not qualitatively. DeepSeek V4 is not capable of doing what a human brain can do, of course, but for the tasks it can do, it can do it at a speed which is completely impossible for a human, so comparing the two requires some normalization for speed.</p>
]]></description><pubDate>Thu, 07 May 2026 18:06:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48052694</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48052694</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48052694</guid></item><item><title><![CDATA[New comment by antirez in "DeepSeek 4 Flash local inference engine for Metal"]]></title><description><![CDATA[
<p>A random, funny, interesting and telling data point: my MacBook M3 Max while DS4 is generating tokens at full speed peaks 50W of energy usage...</p>
]]></description><pubDate>Thu, 07 May 2026 17:52:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=48052517</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48052517</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48052517</guid></item><item><title><![CDATA[New comment by antirez in "Train Your Own LLM from Scratch"]]></title><description><![CDATA[
<p>It was not my best (nor normal) behavior, but the point in this case is that the OP offered very little in his rebuttal. A more contextualized reply would have improved mine as well. I believe actually the person that published this LLM course on GitHub works at ElevenLabs, as Google shows. So the reply could be: "Are you sure? I googled and apparently he works for ElevenLabs". That would have triggered a different reply. So I was not polite enough, and I said sorry, but given the exchange to say "google it" was not terrible, was exactly <i>how</i> I thought I had found it (I google for the wrong name, but citing MLX, plus X, and Google returned the wrong result). So it was a metter of "I did this way".</p>
]]></description><pubDate>Tue, 05 May 2026 18:05:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=48026221</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48026221</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48026221</guid></item><item><title><![CDATA[New comment by antirez in "Train Your Own LLM from Scratch"]]></title><description><![CDATA[
<p>You are right, sorry the name is very similar and I thought it was: <a href="https://x.com/angeloskath" rel="nofollow">https://x.com/angeloskath</a></p>
]]></description><pubDate>Tue, 05 May 2026 14:06:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=48022728</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48022728</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48022728</guid></item><item><title><![CDATA[New comment by antirez in "Train Your Own LLM from Scratch"]]></title><description><![CDATA[
<p>Google the name of the author.</p>
]]></description><pubDate>Tue, 05 May 2026 13:19:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=48022176</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48022176</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48022176</guid></item><item><title><![CDATA[New comment by antirez in "Train Your Own LLM from Scratch"]]></title><description><![CDATA[
<p>Context: he is one of the MLX developers, a skilled ML researcher.</p>
]]></description><pubDate>Tue, 05 May 2026 07:16:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=48019095</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48019095</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48019095</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>If Pro is the same model (hard to tell, I'm not sure) it has a token budget to think (test time scaling) which is huge compared to the Codex endpoint.</p>
]]></description><pubDate>Mon, 04 May 2026 18:31:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=48012865</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48012865</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48012865</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>Redis was completely built in this way since the start. I believe this is a better way to create software. Compromise in design is, in my opinion, something to avoid: feedbacks are important, but often times a single person that studied a lot the problem and have design taste, can come up with a great solution. Mediating such solution, even among two stellar A and B solutions, will not produce a C soution that is better, since you can't produce such solution by interpolation. It is simpler to damage A and B. And: it is rare that in a big set of people all have stellar ideas, so you have to mediate, often, also with people having poor ideas. Not worth the effort for the way I'm wired. What works better for me is to provide hints about what I'm doing, then I receive feedbacks, and sometimes there are really great ideas in this feedbacks, and I incorporate the part I like.</p>
]]></description><pubDate>Mon, 04 May 2026 18:16:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=48012640</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48012640</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48012640</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>1. The huge jump from from Opus to GPT 5.3. Game changer. GPT 5.4, 5.5, were better but only incrementally better.<p>2. Nope I don't give much personalities, but I use subtle prompt differences to maximize certain responses I want, to make the model focusing in a given detail or acting in a specific kind of engineering mindset.<p>3. It never happened that the AI was slowing me down since I always had the full context and code detail in mind of what was happening. I believe that this happens more when you don't have a clear idea. Also GPT >= 5.3/4 is not the past generation of models, it is very hard to trap it into a situation where it seems unable to understand what you mean.<p>4. A few times the AI provided fresh insights that I really liked. Most of the times it was the other way around. Certain implementations were written by the AI at a very impressive level of quality.<p>5. I don't use general skills, I build skills with deep search when needed for specific projects, and build an AGENT.md that works as a knowledge base as I work with the AI. One thing that I use a lot is, when there is a very complex problem, to tell GPT that I have a friend called Machiavelli that is an incredible computer scientist. To write him an email in /tmp/letter.md with the problem we are facing, and I'll try to get a reply. Then I ask GPT 5.5 Pro on the web with extensive reasoning set on. It will take sometimes 30 minutes or more to reply. Often times after I feed back the reply, the agent will be able to see things a lot more clearly.</p>
]]></description><pubDate>Mon, 04 May 2026 18:15:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48012629</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48012629</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48012629</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>The code is 5000 lines of code in total, comments included:<p>2000 lines the sparse array.<p>2000 lines the t_array commands and upper layer implementation.<p>~500 lines of AOF / RDB code.<p>All the other stuff is tests, JSON command descriptions, TRE library under "deps".</p>
]]></description><pubDate>Mon, 04 May 2026 18:10:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48012554</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48012554</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48012554</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>Unfortunately not, sorted sets are actually a bit in the other side of the spectrum: they are semantically sound, but absolutely wasteful because of the <i>combined</i> skiplist + array. Also, if the underlying representation is not an array, range queries and ring buffers will never be as efficient and compact as they should. In theory you can do everything with everything, but segmenting what each API can do allows you to exploit the use cases to provide the best underlying implementation.</p>
]]></description><pubDate>Mon, 04 May 2026 18:08:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48012535</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48012535</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48012535</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>Redis sets the locale at startup to avoid issues so should be ok but we will document that for instance è will not match È when nocase is used.</p>
]]></description><pubDate>Mon, 04 May 2026 17:33:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=48011975</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48011975</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48011975</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>KEYS comes immediately to mind :)</p>
]]></description><pubDate>Mon, 04 May 2026 17:27:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=48011873</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48011873</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48011873</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>Once I realized arrays were a great fit for text files, many use cases I could conceive were always limited by the fact we need to grep on files. So I thought: what is the AROP equivalent for files? ARGREP. Then I made sure to add both fast, exact and regexp matching so that depending on the use case the best tool could be used. I then discovered that for many OR-ed strings regexps could be the faster way if we'll optimized. And then I specialized TRE a bit.</p>
]]></description><pubDate>Mon, 04 May 2026 17:09:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=48011585</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48011585</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48011585</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>I should probably remove the Adsense JS which I don't use anyway...</p>
]]></description><pubDate>Mon, 04 May 2026 16:35:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=48011022</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48011022</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48011022</guid></item><item><title><![CDATA[New comment by antirez in "Redis array: short story of a long development process"]]></title><description><![CDATA[
<p>I appreciate your kind reply as well :)</p>
]]></description><pubDate>Mon, 04 May 2026 16:17:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=48010690</link><dc:creator>antirez</dc:creator><comments>https://news.ycombinator.com/item?id=48010690</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48010690</guid></item></channel></rss>