Hacker News: adchurch

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Mon, 29 Jun 2026 13:46:11 +0000

Effectively yes (based on cost though, not raw token count)

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Sat, 27 Jun 2026 16:41:03 +0000

We trained a model to select which LLM to call at any given turn, based on lots of agent traces

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Sat, 27 Jun 2026 16:39:32 +0000

Yes the open source models are very good, that’s a big part of what makes this router save so much money in practice! There definitely are some things they still don’t handle well though where you do want a frontier model

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Sat, 27 Jun 2026 16:37:47 +0000

Yes we can route to Gemini models too and we handle all the translation complexity there!

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Sat, 27 Jun 2026 16:34:19 +0000

We welcome the competition :)

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Sat, 27 Jun 2026 16:33:03 +0000

Yep exactly

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Sat, 27 Jun 2026 02:29:23 +0000

Yes because it's a model explicitly trained to make model selections! Opus probably doesn't have a great idea of when to send a task to DeepSeek vs. to Sonnet, for example.

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 21:44:04 +0000

We haven't experimented with routing to local LLMs much. Technically they benefit from the cache too although it's more a question of latency than cost. But tbh I haven't seen great results in the wild from working with local LLMs for coding - curious if you've had any success with them?

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 21:42:08 +0000

I think the key detail here is that we use embeddings of the prompt + previous context in order to decide where to route the request, and if one model is getting stuck we can course correct and move to a different model.

So: we can reasonably cluster similar problems together and learn how models handle them, and the entire system doesn't fail if the initial decision is off.

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 21:34:52 +0000

We consider the cost of missing the cache when making each routing decision after the initial one. Discussed in a bit more depth here: https://news.ycombinator.com/item?id=48689448

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 21:33:08 +0000

Good questions. From what I can tell, vLLM semantic router is more optimized for one-off prompt/response workflows rather than agentic coding (I don't think it's cache aware).

As another commenter (https://news.ycombinator.com/item?id=48689994) pointed out, for one-off requests, I think it makes more sense to lock to one model whose behavior you understand very well. For dynamic requests like the ones going to a coding agent I think dynamic routing makes more sense but it does need to be cache aware.

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 21:27:44 +0000

Cool, interested to see your approach when you do launch! I think it's a really interesting problem

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 21:26:58 +0000

Great question! Our main product quantifies engineering productivity & quality so I think we're uniquely qualified to answer this - our velocity has only gone up and our quality (bugs introduced, code turnover) has not budged per our own analysis.

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 21:24:40 +0000

Oh interesting, didn't know Cursor did that! Totally makes sense though, routing subagents is def the easiest win, no need to have any cache awareness.

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 21:23:22 +0000

If you have a Claude sub with subsidized usage we use that. If not you pay API prices.

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 20:07:22 +0000

Really appreciate the thoughtful feedback!

1. Agree it's important, fwiw the proxy model doesn't blow this up though - only incurs a 1 time cost when switching models and we're aware of that when making routing decisions

2. The agents are model aware yes but they are not incentivized to optimize too heavily here (in particular they don't use OS models even when they would be better). I think that's where this router comes in and brings genuine improvement.

3. Two parts here: 1 is continuing to grow our golden dataset over time, 2 is using reward signals from production traffic (on a per-customer basis or, if allowed, across all users)

4. Yes we have these internally, great callout that we should publish! Will do + will link from the repo soon. (Fwiw I think these benchmarks are useful but don't fully capture vibes - you should try it out yourself for that!)

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 20:02:29 +0000

Appreciate the kind words! Lmk if you have any feedback on it from using!

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 20:01:41 +0000

I would argue they do not have a good incentive to build this and make it better. Why would Anthropic route Claude Code traffic to DeepSeek (at 20% of the cost)?

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 19:01:57 +0000

Very important consideration, addressed it in another thread (https://news.ycombinator.com/item?id=48689448). tl;dr we built this to be cache aware for exactly this reason

New comment by adchurch in "Show HN: Smart model routing directly in Claude, Codex and Cursor"

adchurch — Fri, 26 Jun 2026 19:00:36 +0000

When we started building this we did it as an experiment and we thought the same thing might be true (cache misses would make the whole thing pointless). This turned out not to be true! I think there are 3 reasons intuitively:

1. Small models can carry out a good number of requests e2e 2. Small model for part of a request + cache miss < big model for entire request in many cases 3. Subagents

For our own usage we've saved 40% so far (that is of course including costs of uncached requests when switching models)