<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: dot_treo</title><link>https://news.ycombinator.com/user?id=dot_treo</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 20 May 2026 19:02:47 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=dot_treo" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by dot_treo in "Map of Metal"]]></title><description><![CDATA[
<p>Reminds me very much of <a href="https://music.ishkur.com/" rel="nofollow">https://music.ishkur.com/</a> which is the same kind of thing but for electronic music.</p>
]]></description><pubDate>Wed, 20 May 2026 12:18:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=48206493</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=48206493</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48206493</guid></item><item><title><![CDATA[New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"]]></title><description><![CDATA[
<p>By the looks of it, it will take a couple more follow up PRs to clean things up a bit and get the most performance from MTP. I hope that by that point it will be easier to add more spec decoding types.<p>In the meantime I've benchmarked Orthrus some more and got some quite promising results. So I'd be glad if my prediction that it may take some time until it lands in llama.cpp turns out to be wrong.</p>
]]></description><pubDate>Sat, 16 May 2026 21:51:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48164114</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=48164114</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48164114</guid></item><item><title><![CDATA[New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"]]></title><description><![CDATA[
<p>And it also looks like the original authors are working on qwen 3.5 too:
<a href="https://github.com/chiennv2000/orthrus/issues/1#issuecomment-4467775779" rel="nofollow">https://github.com/chiennv2000/orthrus/issues/1#issuecomment...</a></p>
]]></description><pubDate>Sat, 16 May 2026 19:45:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48163154</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=48163154</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48163154</guid></item><item><title><![CDATA[New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"]]></title><description><![CDATA[
<p>I would probably treat the (3 GatedDeltaNet + 1 GatedAttention) Blocks as one transformer block, when generating next steps one would therefore use the kv cache for the gated attention and skip the entire delta nets.</p>
]]></description><pubDate>Sat, 16 May 2026 19:16:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=48162931</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=48162931</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48162931</guid></item><item><title><![CDATA[New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"]]></title><description><![CDATA[
<p>I've tried MTP, and that got me about 1.5x on average with a very spec friendly benchmark.<p>I didn't run the full benchmark with the demo code, just picked up a single prompt from it. The prompt is about 1300 token, the response is about 3200 token.<p>Baseline: 44.8 t/s
With Orthrus: 164.6 t/s<p>Note: Don't use the `use_diffusion_mode=` config flag in their example to collect a baseline. Something about how the fallback to "normal" makes it grind to a crawl.</p>
]]></description><pubDate>Sat, 16 May 2026 18:17:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=48162472</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=48162472</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48162472</guid></item><item><title><![CDATA[New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"]]></title><description><![CDATA[
<p>Just to get it into a GGUF file would be fairly trivial. But using that GGUF file would need a bunch of additional things. One would need to create a new architecture derived from Qwen3, and then probably adapt the speculative decoding functionality.<p>At the moment not even MTP is merged into llama.cpp, so I wouldn't quite hold my breath for it.</p>
]]></description><pubDate>Sat, 16 May 2026 09:27:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48158491</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=48158491</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48158491</guid></item><item><title><![CDATA[New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"]]></title><description><![CDATA[
<p>It is all about moving the bottleneck. During prompt processing everything can be calculated in parallel, while during token generation you create a single token at a time. For example, using an RTX 4000 Ada, I'm getting 2700 t/s for prompt processing, and 48 t/s for token generation using an 8B class model.<p>Their approach is essentially a speculative decoding approach where multiple tokens are predicted at once and then verified. Therefore getting more tokens to be created at a speed that is closer to the prompt processing speed.<p>It seems to be special because their approach yields the exact same output distribution as the base model and it only takes a negligable amount of additional memory.<p>The main catch is that if your prompt processing speed is already bad, it will not help you all that much.<p>For example, the M-series Macs (up to M4) have a relative high generation speed compared to their prompt processing speed. That means they will not benefit as much (if at all). With the M5 the prompt processing speed has increased 4x, so those can expect to see a good uplift.</p>
]]></description><pubDate>Sat, 16 May 2026 08:55:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48158281</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=48158281</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48158281</guid></item><item><title><![CDATA[New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"]]></title><description><![CDATA[
<p>Do you plan on releasing the training code?</p>
]]></description><pubDate>Sat, 16 May 2026 08:06:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48157946</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=48157946</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48157946</guid></item><item><title><![CDATA[New comment by dot_treo in "Germany suspends military approval for long stays abroad for men under 45"]]></title><description><![CDATA[
<p>True, but how many 18 year olds do you know that will just randomly get their balls checked?</p>
]]></description><pubDate>Thu, 16 Apr 2026 15:03:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47794208</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47794208</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47794208</guid></item><item><title><![CDATA[New comment by dot_treo in "Germany suspends military approval for long stays abroad for men under 45"]]></title><description><![CDATA[
<p>It used to be that way, and probably will be that way again. I know of a few of people who got an early testicular cancer diagnosis that way. So it seems that there is a medical use for it.</p>
]]></description><pubDate>Thu, 16 Apr 2026 08:14:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47790145</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47790145</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47790145</guid></item><item><title><![CDATA[New comment by dot_treo in "Do LLMs Break the Sapir-Whorf Hypothesis?"]]></title><description><![CDATA[
<p>I don't care too much about the article being written with LLM support. There is actual work being done that is being showcased here. I'd rather read an LLM version of it, rather that not learning about those things at all.</p>
]]></description><pubDate>Tue, 31 Mar 2026 19:07:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47592000</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47592000</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47592000</guid></item><item><title><![CDATA[New comment by dot_treo in "Do LLMs Break the Sapir-Whorf Hypothesis?"]]></title><description><![CDATA[
<p>The linguistic argument is fascinating.<p>One particular thing, unrelated to the linguistic argument itself, stood out to me. In the PCA visualisation, we can see that some sequences of layers have particularly tight and stationary clusters. Incidentally, those are also exactly the layers that the previous RYS post identified as most useful to repeat to improve perfomance on the probes.<p>I wonder, if that correlation could be used to identify good candidates for repeating layers.</p>
]]></description><pubDate>Tue, 31 Mar 2026 19:00:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47591909</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47591909</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47591909</guid></item><item><title><![CDATA[Do LLMs Break the Sapir-Whorf Hypothesis?]]></title><description><![CDATA[
<p>Article URL: <a href="https://dnhkng.github.io/posts/sapir-whorf/">https://dnhkng.github.io/posts/sapir-whorf/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47591834">https://news.ycombinator.com/item?id=47591834</a></p>
<p>Points: 15</p>
<p># Comments: 7</p>
]]></description><pubDate>Tue, 31 Mar 2026 18:55:18 +0000</pubDate><link>https://dnhkng.github.io/posts/sapir-whorf/</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47591834</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47591834</guid></item><item><title><![CDATA[New comment by dot_treo in "My minute-by-minute response to the LiteLLM malware attack"]]></title><description><![CDATA[
<p>Looks like we discovered it at essentially the same time, and in essentially the same way. If the pth file didn't trigger a fork-bomb like behavior, this might have stayed undiscoverd for quite a bit longer.<p>Good thinking on asking Claude to walk you through on who to contact. I had no idea how to contact anyone related to PyPI, so I started by shooting an email to the maintainers and posting it on Hacker News.<p>While I'm not part of the security community, I think everyone who finds something like this, should be able to report it. There is no point in gatekeeping the reporting of serious security vulnerabilities.</p>
]]></description><pubDate>Thu, 26 Mar 2026 16:22:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47532444</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47532444</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47532444</guid></item><item><title><![CDATA[New comment by dot_treo in "Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised"]]></title><description><![CDATA[
<p>It actually wasn't. That was one of the reasons why I looked into what was changed. Even 1.82.6 is only at an RC release on github since just before the incident.<p>So the fact that 1.82.7 and then 1.82.8 were released within an hour of each other was highly suspicious.</p>
]]></description><pubDate>Wed, 25 Mar 2026 11:41:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=47516009</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47516009</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47516009</guid></item><item><title><![CDATA[New comment by dot_treo in "Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised"]]></title><description><![CDATA[
<p>Yeah, that release has the base64 blob, but it didn't contain the pth file that auto triggers the malware on import.</p>
]]></description><pubDate>Tue, 24 Mar 2026 13:30:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47502318</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47502318</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47502318</guid></item><item><title><![CDATA[New comment by dot_treo in "Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised"]]></title><description><![CDATA[
<p>Even just having an import statement for it is enough to trigger the malware in 1.82.8.</p>
]]></description><pubDate>Tue, 24 Mar 2026 13:10:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47502067</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47502067</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47502067</guid></item><item><title><![CDATA[Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised]]></title><description><![CDATA[
<p>About an hour ago new versions have been deployed to PyPI.<p>I was just setting up a new project, and things behaved weirdly. My laptop ran out of RAM, it looked like a forkbomb was running.<p>I've investigated, and found that a base64 encoded blob has been added to proxy_server.py.<p>It writes and decodes another file which it then runs.<p>I'm in the process of reporting this upstream, but wanted to give everyone here a headsup.<p>It is also reported in this issue:
<a href="https://github.com/BerriAI/litellm/issues/24512" rel="nofollow">https://github.com/BerriAI/litellm/issues/24512</a></p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47501426">https://news.ycombinator.com/item?id=47501426</a></p>
<p>Points: 938</p>
<p># Comments: 500</p>
]]></description><pubDate>Tue, 24 Mar 2026 12:06:29 +0000</pubDate><link>https://github.com/BerriAI/litellm/issues/24512</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47501426</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47501426</guid></item><item><title><![CDATA[New comment by dot_treo in "Push events into a running session with channels"]]></title><description><![CDATA[
<p>The main reason is just how hard it is to actually create anything that integrates with Teams. You have to jump through so meany hoops, wade through so many deprecated APIs, guess through so many half-way-wrong-by-now documentation pages.<p>After building a proof of concept, we decided that we will only continue Teams integration if anyone is going to pay serious money for it.</p>
]]></description><pubDate>Fri, 20 Mar 2026 11:11:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47453020</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=47453020</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47453020</guid></item><item><title><![CDATA[New comment by dot_treo in "Indifference is a power (2015)"]]></title><description><![CDATA[
<p>In the past I've been trying to adopt the stoic mindset, but always struggled. But I continued to read and learn about it.<p>Unrelatedly, I came across a recomendation for David Burns "Feeling Good" here on hackernews a couple of years ago.<p>Reading it with my interest in stoicism in mind, I honestly found it to be probably the best modern day handbook to actually adopting the stoic mindset - without ever mentioning it.<p>As far as I understand stoicism, it is all about seeing things as they are, and understanding that the only thing that we really control is our reaction / interpretation of events. And the CBT approach that is explained in Feeling Good/Feeling Great is exactly how you do this.<p>With this perspective Marcus Aurelius Meditations suddenly make a lot more sense. They are his therapy homework.</p>
]]></description><pubDate>Tue, 13 Jan 2026 14:56:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=46601784</link><dc:creator>dot_treo</dc:creator><comments>https://news.ycombinator.com/item?id=46601784</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46601784</guid></item></channel></rss>