Hacker News: vrm

New comment by vrm in "Building durable workflows on Postgres"

vrm — Thu, 28 May 2026 21:26:40 +0000

TBH it's intended only for internal use (we don't even publish it as a crate at this point) so I don't particularly mind it being low-key. But I appreciate it!

New comment by vrm in "Building durable workflows on Postgres"

vrm — Thu, 28 May 2026 20:27:34 +0000

If you don't need a ton of throughput I think `absurd` (and our Rust derivative `durable`) are very nice options that keep the client side extremely simple. It's also lightweight enough that a coding agent can keep the entire thing in its head easily and just run queries to look up state as needed.

New comment by vrm in "Building durable workflows on Postgres"

vrm — Thu, 28 May 2026 18:59:40 +0000

Since DBOS doesn't support Rust, we implemented a very minimal Rust version of this at https://github.com/tensorzero/durable. It has been quite stable and extensible but of course you need to be very careful with the SQL implementations. Hope this is interesting to readers here.

ATLAS: Autoformalized Textbook Library At Scale

vrm — Thu, 28 May 2026 16:40:18 +0000

https://twitter.com/arnal_charles/status/2060009395107377282, https://xcancel.com/arnal_charles/status/2060009395107377282

Paper: Formalizing Mathematics at Scale - https://arxiv.org/abs/2605.29955

Comments URL: https://news.ycombinator.com/item?id=48311485

Points: 32

# Comments: 4

New comment by vrm in "Making deep learning go brrrr from first principles (2022)"

vrm — Sat, 23 May 2026 16:41:06 +0000

It’s really not a concept you can express in idiomatic Python very easily. This comes from the actual generated assembly involving copies from global GPU memory into registers (slow, bandwidth saturates quickly) and back in between the cosines. If you can avoid the intermediate roundtrip that cuts the cost approximately in half.

New comment by vrm in "Formal Verification Gates for AI Coding Loops"

vrm — Wed, 20 May 2026 21:05:05 +0000

One question I have here: I think this type of thing would be trivial to do in Rust with constructors, private fields, and newtypes. What am I getting on top of it?

Stop comparing price per million tokens: the hidden LLM API costs

vrm — Thu, 16 Apr 2026 19:46:29 +0000

Article URL: https://www.tensorzero.com/blog/stop-comparing-price-per-million-tokens-the-hidden-llm-api-costs/

Comments URL: https://news.ycombinator.com/item?id=47798525

Points: 3

# Comments: 2

Ask HN: What do you recommend for test observability?

vrm — Fri, 12 Sep 2025 13:52:54 +0000

I maintain an OSS project with a very involved CI setup. We're at the point where it is worth having observability into which tests are flaky, especially within intra-test-run retries. An ideal solution would be a managed service that takes junit.xml exports from cargo nextest, vitest, playwright, pytest, and go test. What do you all recommend?

Comments URL: https://news.ycombinator.com/item?id=45222181

Points: 3

# Comments: 0

Improving Cursor Tab with RL

vrm — Fri, 12 Sep 2025 03:19:08 +0000

Article URL: https://cursor.com/en/blog/tab-rl

Comments URL: https://news.ycombinator.com/item?id=45218365

Points: 6

# Comments: 0

New comment by vrm in "Databricks is raising a Series K Investment at >$100B valuation"

vrm — Wed, 20 Aug 2025 14:31:08 +0000

that is earnings (net income) not revenue (top line) so these are wildly different and incomparable numbers

New comment by vrm in "Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?"

vrm — Fri, 08 Aug 2025 20:37:20 +0000

a 6:1 parameter ratio is too small for specdec to have that much of an effect. You'd really want to see 10:1 or even more for this to start to matter

New comment by vrm in "Diffsitter – A Tree-sitter based AST difftool to get meaningful semantic diffs"

vrm — Thu, 10 Jul 2025 19:33:36 +0000

This is neat! I think in general there are really deep connections between semantically meaningful diffs (across modalities) and supervision of AI models. You might imagine a human-in-the-loop workflow where the human makes edits to a particular generation and then those edits are used as supervision for a future implementation of that thing. We did some related work here: https://www.tensorzero.com/blog/automatically-evaluating-ai-... on the coding use case but I'm interested in all the different approaches to the problem and especially on less structured domains.

Automatically Evaluating AI Coding Assistants with Each Git Commit

vrm — Sat, 05 Jul 2025 21:34:46 +0000

Article URL: https://www.tensorzero.com/blog/automatically-evaluating-ai-coding-assistants-with-each-git-commit/

Comments URL: https://news.ycombinator.com/item?id=44475733

Points: 3

# Comments: 0

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Tree Search

vrm — Wed, 02 Jul 2025 00:30:12 +0000

Article URL: https://arxiv.org/abs/2503.04412

Comments URL: https://news.ycombinator.com/item?id=44439235

Points: 3

# Comments: 0

New comment by vrm in "Reverse Engineering Cursor's LLM Client"

vrm — Sat, 07 Jun 2025 20:28:22 +0000

if you haven't check out our repo -- it's free, fully self-hosted, production-grade, and designed for precisely this application :)

https://github.com/TensorZero/tensorzero

New comment by vrm in "Reverse Engineering Cursor's LLM Client"

vrm — Sat, 07 Jun 2025 14:29:20 +0000

I definitely see different prompts based on what I'm doing in the app. As we mentioned there are different prompts for if you're asking questions, doing Cmd-K edits, working in the shell, etc. I'd also imagine that they customize the prompt by model (unobserved here, but we can also customize per-model using TensorZero and A/B test).

New comment by vrm in "Reverse Engineering Cursor's LLM Client"

vrm — Sat, 07 Jun 2025 14:27:13 +0000

we're doing the latter! Cursor lets you configure the OpenAI base URL so we were able to have Cursor call Ngrok -> Nginx (for auth) -> TensorZero -> LLMs. We explain in detail in the blog post.

New comment by vrm in "Reverse Engineering Cursor's LLM Client"

vrm — Sat, 07 Jun 2025 12:05:56 +0000

wireshark would work for seeing the requests from the desktop app to Cursor’s servers (which make the actual LLM requests). But if you’re interested in what the actual requests to LLMs look like from Cursor’s servers you have to set something like this up. Plus, this lets us modify the request and A/B test variations!

New comment by vrm in "AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms"

vrm — Wed, 14 May 2025 16:03:57 +0000

We're working on an OSS industrial-grade version of this at TensorZero but there's a long way to go. I think the easiest out of the box solution today is probably OpenAI RFT but that's a partial solve with substantial vendor lock-in.

New comment by vrm in "AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms"

vrm — Wed, 14 May 2025 15:32:08 +0000

This is very neat work! Will be interested in how they make this sort of thing available to the public but it is clear from some of the results they mention that search + LLM is one path to the production of net-new knowledge from AI systems.