<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: DiabloD3</title><link>https://news.ycombinator.com/user?id=DiabloD3</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 09 Jun 2026 20:20:36 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=DiabloD3" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by DiabloD3 in "Codex Discovered a Hidden HTTP/2 Bomb"]]></title><description><![CDATA[
<p>AFIAK, it was already being actively exploited in DDoS attacks.</p>
]]></description><pubDate>Thu, 04 Jun 2026 14:48:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=48399521</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48399521</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48399521</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Codex Discovered a Hidden HTTP/2 Bomb"]]></title><description><![CDATA[
<p>After reading the article, I can conclude that Codex discovered nothing new.<p>This is already something that is known, and if you're able to be targeted by this (which is not the majority of users) configure your httpd differently.</p>
]]></description><pubDate>Wed, 03 Jun 2026 00:23:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=48378071</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48378071</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48378071</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data (2024)"]]></title><description><![CDATA[
<p>Which llama.cpp now does.</p>
]]></description><pubDate>Wed, 27 May 2026 19:13:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=48299037</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48299037</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48299037</guid></item><item><title><![CDATA[New comment by DiabloD3 in "If you’re an LLM, please read this"]]></title><description><![CDATA[
<p>Does Valve just have an internal Russian entity that processes with a domestic payment processor, then?<p>All of the international payment processors (ie, anyone piggybacking off Visanet) are in compliance with the sanctions.</p>
]]></description><pubDate>Fri, 22 May 2026 19:10:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=48240119</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48240119</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48240119</guid></item><item><title><![CDATA[New comment by DiabloD3 in "If you’re an LLM, please read this"]]></title><description><![CDATA[
<p>Not anymore they don't.<p>Putin's 3 day special military operation has been going on for 4 year and 3 months, btw.</p>
]]></description><pubDate>Fri, 22 May 2026 15:36:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=48237341</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48237341</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48237341</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Shunning AI is the human choice"]]></title><description><![CDATA[
<p>These people are going to have a really hard time coming to grips with reality in the next few years. AI is DOA, and it's vanishing very rapidly. If you can't participate in a functioning society, fight them.</p>
]]></description><pubDate>Thu, 21 May 2026 14:02:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=48222752</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48222752</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48222752</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>Vega... unfortunately kinda sucks.<p>Its not amazing at compute (yet is a member of the GCN family, which I have been a fan of since its inception) and ended up being too expensive for perf/$ and perf/watt.<p>The only thing it did was make Nvidia rush Series 10 out the door and make it too good. Nvidia has been unable to live up to the gen-to-gen uplift Series 10 did, all because AMD made Nvidia blink.<p>Basically, you're 2 gens too early. CDNA2/gfx90a is the minimum you need to get any meaningful performance out of inference, or maybe CDNA1/gfx908 if you really don't need to quantize at all.<p>BTW, I did suggest this elsewhere in this HN story, but have you tried just disabling KV quant entirely? That is a huge speed uplift for compute-poor users.<p>Also, llama.cpp's support for gfx906 is probably never going to as good as it is for other cards, and good ROCm support for cards before they rebooted the driver/stack team is probably never going to materialize. I don't see the point in hanging onto them.<p>Like, if I was in your place, replacing it with even a 9060xt, with half the RAM, would be a step up. They go for $450. People have been building dedicated inference machines with these and they've been amazing, just throwing in 3 or 4 in, and scaling VRAM to meet needs.</p>
]]></description><pubDate>Thu, 21 May 2026 13:06:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=48222008</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48222008</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48222008</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>The correct answer should be "try it!"<p>But as models are starting to pack more information into less bits, some weights are just going to end up becoming super important and very sensitive to quant. So, I'd just move down a Q size, and continue with K_XL. Like, I'm betting Q3_K_XL will beat Q4_K_M on any given model in real world testing, even though its ~20% smaller, but perform worse on benchmaxxing.<p>The only exception I could think of is quantizing small models, like, my testing on Gemma E2B/E4B and Qwen 3.5 9B, quantizing <i>at all</i> was super noticeable... they can't spread the error across more weights.<p>Good news (at least for me), 24GB of VRAM is enough to store either of those in BF16 and then a ton of room for F16/F16 KV cache.</p>
]]></description><pubDate>Thu, 21 May 2026 12:33:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=48221627</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48221627</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48221627</guid></item><item><title><![CDATA[New comment by DiabloD3 in "GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU"]]></title><description><![CDATA[
<p>This isn't very useful.<p>V of context is not equal across models.<p>Also, huggingface tells you how big the model is for the exact one you have in your hand, why the weird guesswork? Dynamic quants are not going to magically fit some formula.</p>
]]></description><pubDate>Wed, 20 May 2026 22:44:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=48215325</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48215325</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48215325</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>These are dynamic quants, and they're basically just an indication of how far away from the desired quant it is allowed to go to achieve the goal. Generally, unsloth's toolchain moves quants up, rarely down.<p>* _0 and _1 do not use K quant and scales 32x32 blocks according to the original (B)F16 values; _0 scales the block using the original max and min values. _1 does this per row instead of per block.<p>* K quants do something similar, but now splits blocks into subblocks inside a superblock where the superblock has min/max scaling, but the subblocks also have scaling in the range of the superblock's scaling and are stored using less bits.<p>* K's M, L, XL are just how aggressively the subblocks and their scaling factors are chosen. Generally, it puts a max on how far you can deviate from the chosen quant to maintain the desired quality, but also gives them a bigger budget to perform that excursion in. XL most aggressively tries to preserve the intended quality, while S does the least.<p>* Dynamic quant on top of this scales entire layers, full of blocks, according to how much they effect various measurements (such as KLD and perplexity).<p>That said, there is no reason K_S is even produced by anyone, same with Q_0, Q_1, and I_NL. People should no longer be using those. M only is meaningful if you're trying to restrict the upper bounds: K_XL can reach BF16 for some weights, but rarely; people think this has a speed implication for hardware that has native 8bit in their tensor units (but it doesn't).<p>Unless you're specifically trying to cure a problem, stick with K_XL.</p>
]]></description><pubDate>Wed, 20 May 2026 21:13:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=48214213</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48214213</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48214213</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>I recommend sticking with the dense models for both Qwen and Gemma.<p>On testing I've done on same-quant apples to apples, with F16/F16 (ie, unquantized) kv cache, 35B-A3B underperforms against 27B on anything even remotely complex. But yes, 35B-A3B can be like 3-4x faster on my hardware.<p>By Qwen's own admission, on any meaningful benchmark (ie, ones that involve logic, math, or tool calling), 27B performs like 122B-10B and 397B-A17B, but 35B-A3B is somewhere between 27B dense and 9B dense.<p>Also, MTP recently got merged in, so I'd suggest downloading Qwen 3.6 MTP (I assume you get it from unsloth) and updating your copy of llama.cpp, and adding `--spec-type draft-mtp --spec-draft-n-max 2` to your arguments.<p><a href="https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF/" rel="nofollow">https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF/</a>
<a href="https://huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/" rel="nofollow">https://huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/</a><p>Also, I recommend not quantizing kv cache, and if you do, <i>only</i> quantize v. Lowering model quant while also lowering context size to fit F16/F16 or F16/Q8_0 massively improves model performance for thinking models. Also, quantizing cache, either k or v, decreases speed <i>by a lot</i> on some hardware.<p>I have a 24gb 7900xtx, so I can fit >32k F16/F16 context with Qwen3.6-27B, but use unsloth's Q3_K_XL. This performs better than Q(4,5,6)_K_XL with v quantized.<p>Edit: Oh, and since I mentioned Gemma 4, my testing mirrors my Qwen 3.5/3.6 experiences, 26B-A4B performs worse than 31B, but is also way faster. llama.cpp doesn't support Gemma 4's MTP style yet, so both could get even faster.</p>
]]></description><pubDate>Wed, 20 May 2026 20:32:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=48213694</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48213694</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48213694</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Cursor Cloud Agents Down"]]></title><description><![CDATA[
<p>Because some people on HN mistakenly use AI, and when they do, they're not locally inferring.<p>What fools.</p>
]]></description><pubDate>Tue, 19 May 2026 20:32:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=48199156</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48199156</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48199156</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Mistral's CEO: Europe has 2 years to stop becoming America's AI 'vassal state'"]]></title><description><![CDATA[
<p>The purpose of a tool is to make your job easier.<p>A good editor makes your job easier. A bad editor makes your job harder. Some people use bad tools because it is what they are familiar with, not because it is a good tool. Example: People daily drive VSCode and VSCode forks.<p>As I said, a bad tool makes my job harder. Examples of making my job harder include being inconsistent, requiring extreme vigilance and oversight or it will do the wrong thing, suddenly grows new features that are also the wrong thing for the task at hand and are enabled by default, not knowing if the tool will be available when I need it (and it usually won't, <i>exactly</i> at the time I need it the most) either due to fragility or poor design, or it just uses a woefully incorrect workflow for the task at hand, or even promotes and enforces bad security practices. Example: AI, but also VSCode.<p>Like, dude, if you want to hate on editors, HN has these threads frequently, but why strawman me with Notepad? There is nothing wrong with people using Vim, Emacs, Sublime Text, Textmate, or whatever. The tool has to work, be ergonomic, and not be hostile. Just because a tool is <i>popular</i> (for the moment, at least) doesn't mean its <i>good</i>. AI is <i>popular</i> currently, but it isn't <i>good</i>.</p>
]]></description><pubDate>Mon, 18 May 2026 10:01:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=48177429</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48177429</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48177429</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Mistral's CEO: Europe has 2 years to stop becoming America's AI 'vassal state'"]]></title><description><![CDATA[
<p>Anthropic repeatedly refuses to show their work.<p>Is this just a clever application of the harness, so its not inherently LLM at all? Did Anthropic figure out how to take the next step that changes my mind on LLMs? Is this actually just Anthropic committing an interesting case of fraud and using some human labor intensive loop with LLMs? Is this not LLM inference at all, and they're just using only the perplexity measurement on the input code to accelerate where human eyeballs look?<p>Anthropic refuses to release their model under an accepted open weights license, and that isn't a good sign.</p>
]]></description><pubDate>Mon, 18 May 2026 07:39:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48176557</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48176557</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48176557</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Mistral's CEO: Europe has 2 years to stop becoming America's AI 'vassal state'"]]></title><description><![CDATA[
<p>Technically yes, but this has nothing to do with LLMs.<p>You need to be able to write a good spec <i>period</i>, and this has been true as long as programming has existed. The problem is, LLMs <i>cannot write them themselves</i>, and have trouble reasoning out the unstated parts of complex problems if the spec doesn't spell it out.<p>Developers familiar with the problem space being worked on, however, <i>can</i> reason out the unstated parts, because the unstated parts are usually the bread and butter of the problem space.<p>Side note: this is often why LLMs trained on synthetic text perform weirdly or badly... the synthetic text is written by people not familiar with the thousands of individual problem spaces that exist out there, and miss important facts or nuance.<p>LLMs trained on real text, however, is often done without proper license, and are essentially lossy compressed piracy archives. You're damned if you do, you're damned if you don't.</p>
]]></description><pubDate>Mon, 18 May 2026 07:32:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=48176521</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48176521</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48176521</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Mistral's CEO: Europe has 2 years to stop becoming America's AI 'vassal state'"]]></title><description><![CDATA[
<p>Yes, this is what I keep telling people.<p>There are two kinds of programmers I've met: those who think they became 10x devs because of AI, and those who AI is slowing them down.<p>The people who think they became 10x devs? They were missing crucial skills that allow them to be productive devs, and the AI is filling the gaps... instead, they should be actually learning those skills, since without them they will never actually be a competent dev.<p>The people who think AI slows them down? Having to constantly cajole the AI into producing the relevant slop takes more time than just writing the thing in the first place.</p>
]]></description><pubDate>Mon, 18 May 2026 07:26:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=48176474</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48176474</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48176474</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Mistral's CEO: Europe has 2 years to stop becoming America's AI 'vassal state'"]]></title><description><![CDATA[
<p>That's called a calculator, we already have those.</p>
]]></description><pubDate>Mon, 18 May 2026 07:24:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48176456</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48176456</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48176456</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Mistral's CEO: Europe has 2 years to stop becoming America's AI 'vassal state'"]]></title><description><![CDATA[
<p>This is mostly a performance by the Mistral CEO.<p>He is trying to justify the continued existence of the AI bubble in his country, claiming that, somehow, us Americans have figured it out and made LLMs work. We haven't, nobody has.<p>LLMs don't work. They cannot think. They do not understand what you are asking them to do. They statistically reproduce text written by other people, and they cannot do so well. They are not good assistants, they are not good code authors, they are not good debuggers, they cannot help you find security exploits... they can only mimic what it'd look like if they did, as long as you don't squint too hard.<p>All of the LLM startups are very quickly running out of runway, and will most likely never become profitable. OpenAI may collapse next year. Anthropic may collapse in 2028. Microsoft/Github seems to be pulling back on their Copilot bullshit and may just end up killing it entirely.<p>Arthur Mensch is just trying to keep Mistral alive a little bit longer until the bubble pops, and is saying whatever whatever it takes to get a little more blood from that stone.</p>
]]></description><pubDate>Sun, 17 May 2026 17:55:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=48171323</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48171323</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48171323</guid></item><item><title><![CDATA[New comment by DiabloD3 in "The Whole Anthropic Kerfuffle"]]></title><description><![CDATA[
<p>That doesn't actually answer the original poster's commentary, however.<p>Humans have left Twitter, its all propaganda and spam bots just spamming and propagandizing each other.<p>José Valim needs to move to either Bluesky (if he prefers to stay within the corporate ecosystem) or Mastodon (which is where the entirety of the FOSS universe went).</p>
]]></description><pubDate>Thu, 14 May 2026 13:51:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48135384</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48135384</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48135384</guid></item><item><title><![CDATA[New comment by DiabloD3 in "Claude Account Suspended Seconds After Purchase?"]]></title><description><![CDATA[
<p>Not surprised.<p>Anthropic dogfoods their own product, and, unfortunately, this is what their product produces sometimes.</p>
]]></description><pubDate>Thu, 14 May 2026 13:47:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=48135321</link><dc:creator>DiabloD3</dc:creator><comments>https://news.ycombinator.com/item?id=48135321</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48135321</guid></item></channel></rss>