Hacker News: lukebechtel

New comment by lukebechtel in "An update on recent Claude Code quality reports"

lukebechtel — Thu, 23 Apr 2026 18:36:20 +0000

Some people seem to be suggesting these are coverups for quantization...

Those who work on agent harnesses for a living realize how sensitive models can be to even minor changes in the prompt.

I would not suspect quantization before I would suspect harness changes.

New comment by lukebechtel in "Show HN: A game where you build a GPU"

lukebechtel — Sat, 04 Apr 2026 20:46:46 +0000

really fun :) thanks!

New comment by lukebechtel in "Cursor 3"

lukebechtel — Thu, 02 Apr 2026 19:07:07 +0000

it sounds like you described it pretty well!

New comment by lukebechtel in "Anatomy of the .claude/ folder"

lukebechtel — Sat, 28 Mar 2026 00:26:48 +0000

~/.claude/projects is where the real fun is :)

New comment by lukebechtel in "Autoresearch on an old research idea"

lukebechtel — Mon, 23 Mar 2026 23:47:39 +0000

What is your domain?

New comment by lukebechtel in "Reports of code's death are greatly exaggerated"

lukebechtel — Mon, 23 Mar 2026 06:53:47 +0000

so we need to make some crazy llms...

New comment by lukebechtel in "IMG_0416 (2024)"

lukebechtel — Fri, 13 Mar 2026 05:35:44 +0000

there used to be https://default-filename-tv.neocities.org/ but it got taken down :/

New comment by lukebechtel in "Surpassing vLLM with a Generated Inference Stack"

lukebechtel — Wed, 11 Mar 2026 16:43:20 +0000

The bitter lesson strikes again, I suppose!

New comment by lukebechtel in "Surpassing vLLM with a Generated Inference Stack"

lukebechtel — Wed, 11 Mar 2026 16:42:45 +0000

Good questions! It's clear I need to gather more metrics from our next generated inference library.

New comment by lukebechtel in "Surpassing vLLM with a Generated Inference Stack"

lukebechtel — Wed, 11 Mar 2026 16:41:43 +0000

Unfortunately it hasn't been open sourced. We're debating how / when to do this right now.

New comment by lukebechtel in "Surpassing vLLM with a Generated Inference Stack"

lukebechtel — Wed, 11 Mar 2026 03:00:48 +0000

This is a fair critique! We plan to use our system to generate many more inference libraries of this nature, and I'll make it a point to release better, broader correctness measures when we do so.

New comment by lukebechtel in "Surpassing vLLM with a Generated Inference Stack"

lukebechtel — Tue, 10 Mar 2026 21:49:08 +0000

Yes, great question!

The system started without paged attention, and recreated its own paged attention implementation automatically once it realized it was a bottleneck.

Pretty cool!

New comment by lukebechtel in "Surpassing vLLM with a Generated Inference Stack"

lukebechtel — Tue, 10 Mar 2026 20:23:04 +0000

Unfortunately, not at present; we went for FP8 because we believed it was generally the best tradeoff of quality and speed. Allowed faster iteration as well.

We believe our improvements would hold on BF16, but let me check.

New comment by lukebechtel in "Surpassing vLLM with a Generated Inference Stack"

lukebechtel — Tue, 10 Mar 2026 19:03:32 +0000

Yes, speculative decoding will make both us and VLLM faster, but we believe it would be a relatively even bump on both sides, so we didn't include it in this comparison. Worth another test!

New comment by lukebechtel in "Surpassing vLLM with a Generated Inference Stack"

lukebechtel — Tue, 10 Mar 2026 19:02:11 +0000

We validate with MMLU and Hellaswag presently, and are getting this independently verified by a 3rd party.

We have considered open-sourcing some of our optimized inference libraries in the future, but have not yet come to a decision on this.

Also if you need a rough intuition as to why this is possible: it's because this entire inference stack was built for exactly one model, and thus we can really tune the entire framework accordingly.

Surpassing vLLM with a Generated Inference Stack

lukebechtel — Tue, 10 Mar 2026 15:12:52 +0000

Article URL: https://infinity.inc/case-studies/qwen3-optimization

Comments URL: https://news.ycombinator.com/item?id=47324364

Points: 62

# Comments: 22

New comment by lukebechtel in "Claude Code Remote Control"

lukebechtel — Wed, 25 Feb 2026 16:09:48 +0000

I also do this!

New comment by lukebechtel in "Ggml.ai joins Hugging Face to ensure the long-term progress of Local AI"

lukebechtel — Fri, 20 Feb 2026 20:44:03 +0000

Thank you Georgi <3

New comment by lukebechtel in "Gemini 3.1 Pro"

lukebechtel — Fri, 20 Feb 2026 03:00:40 +0000

sonnet 4.6 is a third, and equivalent to opus 4.5, which is enough for me usually :)

EDIT: Gemini does have 1m context for "free" though so that's great.

New comment by lukebechtel in "Gemini 3 Deep Think"

lukebechtel — Thu, 12 Feb 2026 17:06:05 +0000

Arc-AGI-2: 84.6% (vs 68.8% for Opus 4.6)

Wow.

https://blog.google/innovation-and-ai/models-and-research/ge...