Hacker News: qeternity

New comment by qeternity in "KVarN: Native vLLM backend for KV-cache quantization by Huawei"

qeternity — Sat, 06 Jun 2026 11:07:00 +0000

I think it very much is worth it!

But the point was that quality didn't magically increase.

New comment by qeternity in "KVarN: Native vLLM backend for KV-cache quantization by Huawei"

qeternity — Thu, 04 Jun 2026 17:04:42 +0000

It's not better quality: 59.3% vs 59.4% fp16 on AIME 25

New comment by qeternity in "Anthropic raises $65B in Series H funding at $965B post-money valuation"

qeternity — Thu, 28 May 2026 19:26:20 +0000

> Venture capitalists & private investors are sucking all of the possible growth and future upside from these companies and then dumping them on retail investors when there's nothing left.

A lot of the money that is deployed by VCs comes from pension funds and asset managers that ultimately manage money for the average Joe.

New comment by qeternity in "Introspective Diffusion Language Models"

qeternity — Tue, 14 Apr 2026 11:44:24 +0000

I haven't read TFA yet but a common technique is speculative decoding where a fast draft model will generate X tokens, which are then verified by the larger target model. The target model may accept some Y <= X tokens but the speedup comes from the fact that this can be done in parallel as a prefill operation due to the nature of transformers.

So let's say a draft model generates 5 tokens, all 5 of these can be verified in parallel with a single forward pass of the target model. The target model may only accept the first 4 tokens (or whatever) but as long as the 5 forward passes of the draft model + 1 prefill of the target model is faster than 4 forward passes of the target, you will have a speedup while maintaining the exact output distribution as the target.

New comment by qeternity in "Anthropic downgraded cache TTL on March 6th"

qeternity — Sun, 12 Apr 2026 18:10:13 +0000

> They paid a billion dollars for a vibe coded mess just for the opportunity to associate themselves with the hype.

Lol no they didn't. It wasn't even an acquihire. They just hired Peter.

Maybe they are paying him incredibly well, but not a billion dollars well.

New comment by qeternity in "Meta removes ads for social media addiction litigation"

qeternity — Sun, 12 Apr 2026 17:10:30 +0000

> It's not any company, its Meta and the channels they administrate come with a set of responsibilities and principles

Sorry, which laws stipulate these special responsibilities and principles?

New comment by qeternity in "Claude mixes up who said what"

qeternity — Thu, 09 Apr 2026 12:54:51 +0000

> or if the model might actually have emitted the formatting tokens that indicate a user message.

These tokens are almost universally used as stop tokens which causes generation to stop and return control to the user.

If you didn't do this, the model would happily continue generating user + assistant pairs w/o any human input.

New comment by qeternity in "Claude mixes up who said what"

qeternity — Thu, 09 Apr 2026 12:48:09 +0000

This does not solve the problem at all, it's just another bandaid that hopefully reduces the likelihood.

New comment by qeternity in "Mamba-3"

qeternity — Sat, 21 Mar 2026 08:19:38 +0000

Yes, it is written for a specific audience.

That is not a reason for snark.

As other commenters have noted, it’s well written.

New comment by qeternity in "Better JIT for Postgres"

qeternity — Wed, 04 Mar 2026 18:15:52 +0000

Yes, lots of things can create indeterminism. But nothing is inherent.

New comment by qeternity in "Better JIT for Postgres"

qeternity — Wed, 04 Mar 2026 18:15:46 +0000

Yes, lots of things can create indeterminism. But nothing is inherent.

New comment by qeternity in "Better JIT for Postgres"

qeternity — Wed, 04 Mar 2026 09:49:19 +0000

> LLMs are inherently non-deterministic.

This isn't true, and certainly not inherently so.

Changes to input leading to changes in output does not violate determinism.

New comment by qeternity in "MCP server that reduces Claude Code context consumption by 98%"

qeternity — Sun, 01 Mar 2026 12:44:55 +0000

> With prompt caching, verbose context that gets reused is basically free.

But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill.

New comment by qeternity in "Google restricting Google AI Pro/Ultra subscribers for using OpenClaw"

qeternity — Mon, 23 Feb 2026 00:49:59 +0000

> Tradition warrants a negotiation phase when one party wishes to change the terms of an agreement, or becomes cognizant that the counterparty may wish to do the same.

They didn't change the agreement. One party violated it, and the other party withdrew as a result.

This is so vanilla. But people will moan because they want subsidized tokens.

New comment by qeternity in "Step 3.5 Flash – Open-source foundation model, supports deep reasoning at speed"

qeternity — Thu, 19 Feb 2026 13:54:52 +0000

Number of parameters is at least a proxy for model capability.

You can achieve incredible tok/dollar or tok/sec with Qwen3 0.6b.

It just won't be very good for most use cases.

New comment by qeternity in "Two different tricks for fast LLM inference"

qeternity — Mon, 16 Feb 2026 01:17:57 +0000

I would guess you haven't done this in practice. Yes, of course inference is memory bound at low batch sizes. This is why we run larger batch sizes!

Also there does not exist any batch size > 1 where per-request throughput is equal to bs=1. Doing any batching at all will slow all intra-batch requests down.

New comment by qeternity in "Two different tricks for fast LLM inference"

qeternity — Sun, 15 Feb 2026 10:35:24 +0000

Yes this article is full of misunderstanding. The main explanation of bottleneck is wrong: it’s the model weights which dominate memory bandwidth (and hence why batching multiple requests in a single pass increases total throughput). If copying user tokens was the bottle neck, batching would not achieve any speed up.

When an author is confused about something so elementary, I can’t trust anything else they write.

New comment by qeternity in "Ireland rolls out basic income scheme for artists"

qeternity — Fri, 13 Feb 2026 17:24:43 +0000

Sure, and again, she should do something else then.

She isn't entitled to have a large family and work whatever job she finds fulfilling.

New comment by qeternity in "Ireland rolls out basic income scheme for artists"

qeternity — Fri, 13 Feb 2026 17:23:53 +0000

If you have any understanding of history, it's naive not to.

New comment by qeternity in "Ireland rolls out basic income scheme for artists"

qeternity — Fri, 13 Feb 2026 17:22:35 +0000

Yes, this was empirically true at the time. Things change. And that does not invalidate my comment in the least.