Hacker News: dot_treo

New comment by dot_treo in "Map of Metal"

dot_treo — Wed, 20 May 2026 12:18:11 +0000

Reminds me very much of https://music.ishkur.com/ which is the same kind of thing but for electronic music.

New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"

dot_treo — Sat, 16 May 2026 21:51:05 +0000

By the looks of it, it will take a couple more follow up PRs to clean things up a bit and get the most performance from MTP. I hope that by that point it will be easier to add more spec decoding types.

In the meantime I've benchmarked Orthrus some more and got some quite promising results. So I'd be glad if my prediction that it may take some time until it lands in llama.cpp turns out to be wrong.

New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"

dot_treo — Sat, 16 May 2026 19:45:39 +0000

And it also looks like the original authors are working on qwen 3.5 too: https://github.com/chiennv2000/orthrus/issues/1#issuecomment...

New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"

dot_treo — Sat, 16 May 2026 19:16:19 +0000

I would probably treat the (3 GatedDeltaNet + 1 GatedAttention) Blocks as one transformer block, when generating next steps one would therefore use the kv cache for the gated attention and skip the entire delta nets.

New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"

dot_treo — Sat, 16 May 2026 18:17:19 +0000

I've tried MTP, and that got me about 1.5x on average with a very spec friendly benchmark.

I didn't run the full benchmark with the demo code, just picked up a single prompt from it. The prompt is about 1300 token, the response is about 3200 token.

Baseline: 44.8 t/s With Orthrus: 164.6 t/s

Note: Don't use the `use_diffusion_mode=` config flag in their example to collect a baseline. Something about how the fallback to "normal" makes it grind to a crawl.

New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"

dot_treo — Sat, 16 May 2026 09:27:56 +0000

Just to get it into a GGUF file would be fairly trivial. But using that GGUF file would need a bunch of additional things. One would need to create a new architecture derived from Qwen3, and then probably adapt the speculative decoding functionality.

At the moment not even MTP is merged into llama.cpp, so I wouldn't quite hold my breath for it.

New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"

dot_treo — Sat, 16 May 2026 08:55:07 +0000

It is all about moving the bottleneck. During prompt processing everything can be calculated in parallel, while during token generation you create a single token at a time. For example, using an RTX 4000 Ada, I'm getting 2700 t/s for prompt processing, and 48 t/s for token generation using an 8B class model.

Their approach is essentially a speculative decoding approach where multiple tokens are predicted at once and then verified. Therefore getting more tokens to be created at a speed that is closer to the prompt processing speed.

It seems to be special because their approach yields the exact same output distribution as the base model and it only takes a negligable amount of additional memory.

The main catch is that if your prompt processing speed is already bad, it will not help you all that much.

For example, the M-series Macs (up to M4) have a relative high generation speed compared to their prompt processing speed. That means they will not benefit as much (if at all). With the M5 the prompt processing speed has increased 4x, so those can expect to see a good uplift.

New comment by dot_treo in "Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution"

dot_treo — Sat, 16 May 2026 08:06:49 +0000

Do you plan on releasing the training code?

New comment by dot_treo in "Germany suspends military approval for long stays abroad for men under 45"

dot_treo — Thu, 16 Apr 2026 15:03:33 +0000

True, but how many 18 year olds do you know that will just randomly get their balls checked?

New comment by dot_treo in "Germany suspends military approval for long stays abroad for men under 45"

dot_treo — Thu, 16 Apr 2026 08:14:28 +0000

It used to be that way, and probably will be that way again. I know of a few of people who got an early testicular cancer diagnosis that way. So it seems that there is a medical use for it.

New comment by dot_treo in "Do LLMs Break the Sapir-Whorf Hypothesis?"

dot_treo — Tue, 31 Mar 2026 19:07:33 +0000

I don't care too much about the article being written with LLM support. There is actual work being done that is being showcased here. I'd rather read an LLM version of it, rather that not learning about those things at all.

New comment by dot_treo in "Do LLMs Break the Sapir-Whorf Hypothesis?"

dot_treo — Tue, 31 Mar 2026 19:00:27 +0000

The linguistic argument is fascinating.

One particular thing, unrelated to the linguistic argument itself, stood out to me. In the PCA visualisation, we can see that some sequences of layers have particularly tight and stationary clusters. Incidentally, those are also exactly the layers that the previous RYS post identified as most useful to repeat to improve perfomance on the probes.

I wonder, if that correlation could be used to identify good candidates for repeating layers.

Do LLMs Break the Sapir-Whorf Hypothesis?

dot_treo — Tue, 31 Mar 2026 18:55:18 +0000

Article URL: https://dnhkng.github.io/posts/sapir-whorf/

Comments URL: https://news.ycombinator.com/item?id=47591834

Points: 15

# Comments: 7

New comment by dot_treo in "My minute-by-minute response to the LiteLLM malware attack"

dot_treo — Thu, 26 Mar 2026 16:22:45 +0000

Looks like we discovered it at essentially the same time, and in essentially the same way. If the pth file didn't trigger a fork-bomb like behavior, this might have stayed undiscoverd for quite a bit longer.

Good thinking on asking Claude to walk you through on who to contact. I had no idea how to contact anyone related to PyPI, so I started by shooting an email to the maintainers and posting it on Hacker News.

While I'm not part of the security community, I think everyone who finds something like this, should be able to report it. There is no point in gatekeeping the reporting of serious security vulnerabilities.

New comment by dot_treo in "Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised"

dot_treo — Wed, 25 Mar 2026 11:41:18 +0000

It actually wasn't. That was one of the reasons why I looked into what was changed. Even 1.82.6 is only at an RC release on github since just before the incident.

So the fact that 1.82.7 and then 1.82.8 were released within an hour of each other was highly suspicious.

New comment by dot_treo in "Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised"

dot_treo — Tue, 24 Mar 2026 13:30:49 +0000

Yeah, that release has the base64 blob, but it didn't contain the pth file that auto triggers the malware on import.

New comment by dot_treo in "Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised"

dot_treo — Tue, 24 Mar 2026 13:10:48 +0000

Even just having an import statement for it is enough to trigger the malware in 1.82.8.

Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised

dot_treo — Tue, 24 Mar 2026 12:06:29 +0000

About an hour ago new versions have been deployed to PyPI.

I was just setting up a new project, and things behaved weirdly. My laptop ran out of RAM, it looked like a forkbomb was running.

I've investigated, and found that a base64 encoded blob has been added to proxy_server.py.

It writes and decodes another file which it then runs.

I'm in the process of reporting this upstream, but wanted to give everyone here a headsup.

It is also reported in this issue: https://github.com/BerriAI/litellm/issues/24512

Comments URL: https://news.ycombinator.com/item?id=47501426

Points: 938

# Comments: 500

New comment by dot_treo in "Push events into a running session with channels"

dot_treo — Fri, 20 Mar 2026 11:11:49 +0000

The main reason is just how hard it is to actually create anything that integrates with Teams. You have to jump through so meany hoops, wade through so many deprecated APIs, guess through so many half-way-wrong-by-now documentation pages.

After building a proof of concept, we decided that we will only continue Teams integration if anyone is going to pay serious money for it.

New comment by dot_treo in "Indifference is a power (2015)"

dot_treo — Tue, 13 Jan 2026 14:56:03 +0000

In the past I've been trying to adopt the stoic mindset, but always struggled. But I continued to read and learn about it.

Unrelatedly, I came across a recomendation for David Burns "Feeling Good" here on hackernews a couple of years ago.

Reading it with my interest in stoicism in mind, I honestly found it to be probably the best modern day handbook to actually adopting the stoic mindset - without ever mentioning it.

As far as I understand stoicism, it is all about seeing things as they are, and understanding that the only thing that we really control is our reaction / interpretation of events. And the CBT approach that is explained in Feeling Good/Feeling Great is exactly how you do this.

With this perspective Marcus Aurelius Meditations suddenly make a lot more sense. They are his therapy homework.