Hacker News: shad42

New comment by shad42 in "We decreased our LLM costs with Opus"

shad42 — Wed, 29 Apr 2026 03:59:58 +0000

We considered wrapping Claude Code when we started building Mendral (this agent in the article). We ended up building our own agent, it's lot more work because we followed all the right patterns as the models evolved (sub-agents, proper token caching, redo basic tools like read,write,edit,bash, etc...). But it paid off over time when you build an agent that is focused on a specific task (not a general coding agent).

The main driver for writing our own agent was to leave it out of the sandbox (the agent loop runs on our backend, we call the sandbox only when needed). We wrote another post about that (it's the latest post on the blog).

However, I am curious how would you implement the triager pattern by only using Claude Code as harness.

New comment by shad42 in "We decreased our LLM costs with Opus"

shad42 — Wed, 29 Apr 2026 03:52:13 +0000

Nice, it's on our todo list to use oss models too. What are you building?

New comment by shad42 in "We decreased our LLM costs with Opus"

shad42 — Wed, 29 Apr 2026 03:38:45 +0000

Curious, what steps did you follow to end up with this design (what did you try before)? And what's your use case for this agent?

New comment by shad42 in "We decreased our LLM costs with Opus"

shad42 — Wed, 29 Apr 2026 03:32:01 +0000

IMO RAG is mostly dead. The game changer with newer models like Opus is the reasoning. So instead of pushing all the context up front (RAG style), it's better to give strong primitives (eg. bash, SQL) and let the agent figure it out.

It's what Claude Code is doing now and the principles we applied for Mendral as well.

That said, you're right that some smaller models can outperform Haiku and we're thinking supporting oss models at some point. But it does not change the core design principles IMO.

New comment by shad42 in "We decreased our LLM costs with Opus"

shad42 — Wed, 29 Apr 2026 03:27:26 +0000

We're dealing with CI logs, produced by a variety of frameworks, languages, etc... And the tough ones to look into are e2e tests, with outputs from infrastructure. I wish a re.match() would be enough, but we often don't even know what to match in the first place.

We started to add deterministic matching on the patterns that the agent sees the most so we don't have to go through the whole thing (for example a flake on PostHog can occurs 100+ times during a day, you don't need to reinvestigate every time). But for new errors, it's tricky.

New comment by shad42 in "We decreased our LLM costs with Opus"

shad42 — Wed, 29 Apr 2026 03:22:57 +0000

It's the same as an escalation. Something we omitted from the post is that we often use Sonnet to write SQL queries.

We wrote another post that was on HN some time ago that goes into the details of SQL queries (linked at the top of this article). Sonnet is perfect for this.

New comment by shad42 in "We decreased our LLM costs with Opus"

shad42 — Wed, 29 Apr 2026 03:21:24 +0000

I am one of Mendral co-founder (my co-founder wrote the article), I am the one to blame for changing the title when posting. I thought our original one was too clickbait and I wanted to better summarize with this title.

Despite the original title, a lot of what we learned comes to how Opus evolved and the ability to reason. And also the fact that Haiku is quite capable if scoped properly, that's the whole purpose of the article.

We decreased our LLM costs with Opus

shad42 — Wed, 29 Apr 2026 00:57:12 +0000

Article URL: https://www.mendral.com/blog/frontier-model-lower-costs

Comments URL: https://news.ycombinator.com/item?id=47942903

Points: 100

# Comments: 29

Multi-player agents don't fit in the sandbox

shad42 — Sat, 25 Apr 2026 15:02:09 +0000

Article URL: https://www.mendral.com/blog/multi-player-agents-sandbox

Comments URL: https://news.ycombinator.com/item?id=47902020

Points: 2

# Comments: 0

We built our AI agent, for analyzing CI logs

shad42 — Wed, 15 Apr 2026 16:09:12 +0000

Article URL: https://www.mendral.com/blog/how-we-built-our-ai-agent

Comments URL: https://news.ycombinator.com/item?id=47781130

Points: 1

# Comments: 0

Same LLM, different agent: a CI debugger built on Claude

shad42 — Tue, 14 Apr 2026 15:50:09 +0000

Article URL: https://www.mendral.com/blog/same-llm-different-agent

Comments URL: https://news.ycombinator.com/item?id=47767245

Points: 2

# Comments: 0

Agent Harness: Inside vs. Outside the Sandbox

shad42 — Sat, 11 Apr 2026 19:21:45 +0000

Article URL: https://www.mendral.com/blog/agent-harness-inside-vs-outside-sandbox

Comments URL: https://news.ycombinator.com/item?id=47733248

Points: 3

# Comments: 0

Same LLM but different output: we built a CI specialist

shad42 — Tue, 07 Apr 2026 15:54:40 +0000

Article URL: https://www.mendral.com/blog/same-llm-different-agent

Comments URL: https://news.ycombinator.com/item?id=47677249

Points: 1

# Comments: 0

We upgraded our agent to Opus and our costs went down

shad42 — Mon, 06 Apr 2026 20:38:24 +0000

Article URL: https://www.mendral.com/blog/frontier-model-lower-costs

Comments URL: https://news.ycombinator.com/item?id=47666713

Points: 2

# Comments: 0

Same LLM, Different Agent: What Changes When You Specialize for CI

shad42 — Mon, 30 Mar 2026 14:49:14 +0000

Article URL: https://www.mendral.com/blog/same-llm-different-agent

Comments URL: https://news.ycombinator.com/item?id=47575082

Points: 3

# Comments: 0

We decreased our LLM costs by switching to Opus

shad42 — Fri, 20 Mar 2026 14:40:12 +0000

Article URL: https://www.mendral.com/blog/frontier-model-lower-costs

Comments URL: https://news.ycombinator.com/item?id=47455239

Points: 2

# Comments: 0

New comment by shad42 in "What CI looks like at a 100-person team (PostHog)"

shad42 — Tue, 17 Mar 2026 15:17:28 +0000

Mendral co-founder here. What happens at PostHog is not uncommon. While building Mendral, we talked to hundreds of team and they all have a similar situation. Initially they come to us to make their CI pipelines faster. But as the agent dives in, the urgency becomes keeping all pipelines reliable. It comes from growing a code base with a test suite. Of course it has to change eventually: splitting the test suite, running specific part of the CI depending on the code, etc... But the situation described in the article is widespread with a product that grows quickly.

What CI looks like at a 100-person team (PostHog)

shad42 — Thu, 12 Mar 2026 15:50:09 +0000

Article URL: https://www.mendral.com/blog/ci-at-scale

Comments URL: https://news.ycombinator.com/item?id=47352578

Points: 56

# Comments: 30

We upgraded to a frontier model and our costs went down

shad42 — Mon, 09 Mar 2026 15:03:48 +0000

Article URL: https://www.mendral.com/blog/frontier-model-lower-costs

Comments URL: https://news.ycombinator.com/item?id=47310061

Points: 1

# Comments: 0

New comment by shad42 in "We gave terabytes of CI logs to an LLM"

shad42 — Sat, 28 Feb 2026 05:04:12 +0000

In some ways: we use their product and they use Mendral