Hacker News: erans

Show HN: I ran every Claude agent turn through the Batch API

erans — Mon, 27 Apr 2026 18:34:45 +0000

I built a tiny Python REPL to answer a dumb-but-useful question:

What happens if every turn in an agent loop goes through Anthropic’s Batch API instead of the normal synchronous endpoint?

The motivation was cost. Batch API is 50% off, which sounds very attractive for agent workloads: evals, background research agents, CI agents, unattended subagents, etc.

The result: it works, but it is awful for a single interactive agent.

In my runs, a one-entry batch usually took ~90–120 seconds to complete. That means a five-turn tool loop becomes a ten-minute interaction. Waiting two minutes for the model to decide it needs to run ls is not a good UX.

But that was also the point of the experiment. A single REPL turn is probably the wrong unit to batch.

The interesting version is fleet-level batching:

- many agents running in parallel - background subagents - CI/eval jobs - multiple harnesses sharing a local proxy - shared prompt prefixes that may benefit from caching

In that world, the batcher should probably sit below the harness as infrastructure. Existing tools keep using the normal API shape, while a proxy decides per request whether it should go sync or async based on latency tolerance.

One surprising observation: in my small, non-rigorous testing, Haiku batches often felt slower than Sonnet/Opus batches. I wouldn’t treat that as a benchmark, but it does suggest routers should measure this rather than assuming “cheap model = batch model.”

Repo is here:

https://github.com/erans/batching-harness

It is intentionally small: one Python file, a basic tool loop, local shell tool, stats panel, and minimal sandboxing.

The useful lesson for me was:

Batch API is terrible as an interaction pattern for one agent. It might be very useful as a hidden optimization layer for a fleet of agents.

Comments URL: https://news.ycombinator.com/item?id=47925427

Points: 3

# Comments: 0

New comment by erans in "An AI agent deleted our production database. The agent's confession is below"

erans — Sun, 26 Apr 2026 20:26:01 +0000

Execution layer security must be deterministic. That's why we are working on AgentSH (https://www.agentsh.org) which is model, framework and harness agnostic.

New comment by erans in "Bitwarden CLI compromised in ongoing Checkmarx supply chain campaign"

erans — Thu, 23 Apr 2026 21:33:11 +0000

The part that seems most important here is that npm install was enough.

Once the compromise point is preinstall, the usual "inspect after install" mindset breaks down. By then the payload has already had a chance to run.

That gets more interesting with agents / CI / ephemeral sandboxes, because short exposure windows are still enough when installs happen automatically and repeatedly.

Another thing I think is worth paying attention to: this payload did not just target secrets, it also targeted AI tooling config, and there is a real possibility that shell-profile tampering becomes a way to poison what the next coding assistant reads into context.

I work on AgentSH (https://www.agentsh.org), and we wrote up a longer take on that angle here:

https://www.canyonroad.ai/blog/the-install-was-the-attack/

New comment by erans in "Cloudflare's AI Platform: an inference layer designed for agents"

erans — Fri, 17 Apr 2026 20:29:07 +0000

It's great to see more such platform popping up. It's good for the ecosystem. We need more hosting options that are clear, secure and have the ability to help people run as many models as possible.

New comment by erans in "Leash: Spreadsheet Based PagerDuty Alternative"

erans — Thu, 23 Oct 2025 20:21:14 +0000

that's great. PagerDuty always felt so expensive and heavy!

New comment by erans in "Show HN: Gocat – URL shortener using Google Sheets as a database"

erans — Wed, 22 Oct 2025 20:37:01 +0000

That's awesome. It's always annoying using those 3rd party ones!

New comment by erans in "Show HN: LunaRoute – a high-performance local proxy for AI coding assistants"

erans — Tue, 21 Oct 2025 23:51:33 +0000

Summary data is in a sqlite database so its easily queryable by you or your friendly AI agent.

You can have a single proxy and have multiple team members use the same one so that you can easily track sessions, token usage, spend etc.

Show HN: LunaRoute – a high-performance local proxy for AI coding assistants

erans — Tue, 21 Oct 2025 23:23:26 +0000

LunaRoute is a high-performance local proxy for AI coding assistants like Claude Code, OpenAI Codex CLI, and OpenCode. Get complete visibility into every LLM interaction with zero-overhead passthrough, comprehensive session recording, and powerful debugging capabilities.

- See Everything Your AI Does - get full logs (JSONL), summary of sessions including tokens used (input/output) as well as tools usage and success rates.

- Privacy & Compliance Built-In - redact or tokenize any sensitive information (regex based).

- Speaks OpenAI and Anthropic dialects so you can route (and translate) when needed between models and providers.

- High performance - passthrough is 0.1ms - 0.2ms latency. Logging and summary is off loaded to a secondary thread to allow maximum performance.

Feedback is always appreciated as well as stars on the repo :)

Comments URL: https://news.ycombinator.com/item?id=45663045

Points: 8

# Comments: 1

New comment by erans in ""Be Different" doesn't work for building products anymore"

erans — Mon, 06 Oct 2025 17:08:14 +0000

true that there is a some kind of a ceiling of what can or can't be done. But that ceiling is way up there. Also, there are enough examples and articles and code that allows enough combination to be made so that its good enough - and that is a very important bar.

There are A LOT of businesses (even big ones managing money and what not) that rely on spreadsheets to do so much. Could this have been an app/service/SaaS/whatever ? probably.

What if these orgs can (mostly) internally solidify some of these processes? what if they don't need an insanely expensive salesforce implementor that can add "custom logic" ?

A lot of times companies will replace "complex software" with half complex process!

What if they don't need Salesforce at all because they need a reasonable simple CRM and don't want to (or shouldn't) pay $10k/seat/year ?

There are still going to be very differentiating apps and services here and there, but as time move on these "technological" advantages will erode and with AI they erode way faster.

New comment by erans in "Show HN: Selfhostllm.org – Plan GPU capacity for self-hosting LLMs"

erans — Fri, 08 Aug 2025 18:03:00 +0000

I also added a Mac version: https://selfhostllm.org/mac/ so you can know which models you can run on your Mac and get an estimated tokens/sec.

Show HN: Selfhostllm.org – Plan GPU capacity for self-hosting LLMs

erans — Fri, 08 Aug 2025 17:49:49 +0000

A simple calculator that estimates how many concurrent requests your GPU can handle for a given LLM, with shareable results.

Comments URL: https://news.ycombinator.com/item?id=44839717

Points: 7

# Comments: 3

New comment by erans in "MCPs, Gatekeepers, and the Future of AI"

erans — Tue, 22 Apr 2025 18:20:33 +0000

I would argue that due to the way MCP servers/tools are added to calls, there will be a pre-step that will figure out which MCPs are even relevant for a request prior to executing it.

New comment by erans in "Show HN: EFF Dice based digital random passphrase generator"

erans — Tue, 13 Sep 2016 20:08:22 +0000

Thanks :-) Fixed it. Don't drink and copy&paste ;-)

Show HN: EFF Dice based digital random passphrase generator

erans — Tue, 13 Sep 2016 18:55:12 +0000

Article URL: https://dicepass.org

Comments URL: https://news.ycombinator.com/item?id=12491179

Points: 3

# Comments: 2

New comment by erans in "Best Linux laptop for web development?"

erans — Sat, 02 Apr 2016 18:12:12 +0000

Most Thinkpads will work just fine out of the box. The hardware is great (I specifically like the keyboard and the little red point mouse).

The T series (and W series) are the work horses here with various screen sizes, disk options and RAM.

I personally use the X1 Carbon 3rd gen (X series) as it is comparable in specification to a Macbook Pro 13" (I have the one with the 16GB of RAM). It's a little costly but its worth while and will probably serve me for the next 2+ years.