<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: shad42</title><link>https://news.ycombinator.com/user?id=shad42</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 29 Apr 2026 16:25:34 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=shad42" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by shad42 in "We decreased our LLM costs with Opus"]]></title><description><![CDATA[
<p>We considered wrapping Claude Code when we started building Mendral (this agent in the article). We ended up building our own agent, it's lot more work because we followed all the right patterns as the models evolved (sub-agents, proper token caching, redo basic tools like read,write,edit,bash, etc...). But it paid off over time when you build an agent that is focused on a specific task (not a general coding agent).<p>The main driver for writing our own agent was to leave it out of the sandbox (the agent loop runs on our backend, we call the sandbox only when needed). We wrote another post about that (it's the latest post on the blog).<p>However, I am curious how would you implement the triager pattern by only using Claude Code as harness.</p>
]]></description><pubDate>Wed, 29 Apr 2026 03:59:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47944032</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47944032</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47944032</guid></item><item><title><![CDATA[New comment by shad42 in "We decreased our LLM costs with Opus"]]></title><description><![CDATA[
<p>Nice, it's on our todo list to use oss models too. What are you building?</p>
]]></description><pubDate>Wed, 29 Apr 2026 03:52:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47943997</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47943997</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47943997</guid></item><item><title><![CDATA[New comment by shad42 in "We decreased our LLM costs with Opus"]]></title><description><![CDATA[
<p>Curious, what steps did you follow to end up with this design (what did you try before)? And what's your use case for this agent?</p>
]]></description><pubDate>Wed, 29 Apr 2026 03:38:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47943930</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47943930</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47943930</guid></item><item><title><![CDATA[New comment by shad42 in "We decreased our LLM costs with Opus"]]></title><description><![CDATA[
<p>IMO RAG is mostly dead. The game changer with newer models like Opus is the reasoning. So instead of pushing all the context up front (RAG style), it's better to give strong primitives (eg. bash, SQL) and let the agent figure it out.<p>It's what Claude Code is doing now and the principles we applied for Mendral as well.<p>That said, you're right that some smaller models can outperform Haiku and we're thinking supporting oss models at some point. But it does not change the core design principles IMO.</p>
]]></description><pubDate>Wed, 29 Apr 2026 03:32:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47943897</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47943897</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47943897</guid></item><item><title><![CDATA[New comment by shad42 in "We decreased our LLM costs with Opus"]]></title><description><![CDATA[
<p>We're dealing with CI logs, produced by a variety of frameworks, languages, etc... And the tough ones to look into are e2e tests, with outputs from infrastructure.
I wish a re.match() would be enough, but we often don't even know what to match in the first place.<p>We started to add deterministic matching on the patterns that the agent sees the most so we don't have to go through the whole thing (for example a flake on PostHog can occurs 100+ times during a day, you don't need to reinvestigate every time). But for new errors, it's tricky.</p>
]]></description><pubDate>Wed, 29 Apr 2026 03:27:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47943869</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47943869</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47943869</guid></item><item><title><![CDATA[New comment by shad42 in "We decreased our LLM costs with Opus"]]></title><description><![CDATA[
<p>It's the same as an escalation. Something we omitted from the post is that we often use Sonnet to write SQL queries.<p>We wrote another post that was on HN some time ago that goes into the details of SQL queries (linked at the top of this article). Sonnet is perfect for this.</p>
]]></description><pubDate>Wed, 29 Apr 2026 03:22:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47943843</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47943843</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47943843</guid></item><item><title><![CDATA[New comment by shad42 in "We decreased our LLM costs with Opus"]]></title><description><![CDATA[
<p>I am one of Mendral co-founder (my co-founder wrote the article), I am the one to blame for changing the title when posting. I thought our original one was too clickbait and I wanted to better summarize with this title.<p>Despite the original title, a lot of what we learned comes to how Opus evolved and the ability to reason. And also the fact that Haiku is quite capable if scoped properly, that's the whole purpose of the article.</p>
]]></description><pubDate>Wed, 29 Apr 2026 03:21:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47943831</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47943831</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47943831</guid></item><item><title><![CDATA[We decreased our LLM costs with Opus]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/frontier-model-lower-costs">https://www.mendral.com/blog/frontier-model-lower-costs</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47942903">https://news.ycombinator.com/item?id=47942903</a></p>
<p>Points: 100</p>
<p># Comments: 29</p>
]]></description><pubDate>Wed, 29 Apr 2026 00:57:12 +0000</pubDate><link>https://www.mendral.com/blog/frontier-model-lower-costs</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47942903</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47942903</guid></item><item><title><![CDATA[Multi-player agents don't fit in the sandbox]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/multi-player-agents-sandbox">https://www.mendral.com/blog/multi-player-agents-sandbox</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47902020">https://news.ycombinator.com/item?id=47902020</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 25 Apr 2026 15:02:09 +0000</pubDate><link>https://www.mendral.com/blog/multi-player-agents-sandbox</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47902020</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47902020</guid></item><item><title><![CDATA[We built our AI agent, for analyzing CI logs]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/how-we-built-our-ai-agent">https://www.mendral.com/blog/how-we-built-our-ai-agent</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47781130">https://news.ycombinator.com/item?id=47781130</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 15 Apr 2026 16:09:12 +0000</pubDate><link>https://www.mendral.com/blog/how-we-built-our-ai-agent</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47781130</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47781130</guid></item><item><title><![CDATA[Same LLM, different agent: a CI debugger built on Claude]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/same-llm-different-agent">https://www.mendral.com/blog/same-llm-different-agent</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47767245">https://news.ycombinator.com/item?id=47767245</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 14 Apr 2026 15:50:09 +0000</pubDate><link>https://www.mendral.com/blog/same-llm-different-agent</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47767245</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47767245</guid></item><item><title><![CDATA[Agent Harness: Inside vs. Outside the Sandbox]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/agent-harness-inside-vs-outside-sandbox">https://www.mendral.com/blog/agent-harness-inside-vs-outside-sandbox</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47733248">https://news.ycombinator.com/item?id=47733248</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 11 Apr 2026 19:21:45 +0000</pubDate><link>https://www.mendral.com/blog/agent-harness-inside-vs-outside-sandbox</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47733248</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47733248</guid></item><item><title><![CDATA[Same LLM but different output: we built a CI specialist]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/same-llm-different-agent">https://www.mendral.com/blog/same-llm-different-agent</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47677249">https://news.ycombinator.com/item?id=47677249</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 07 Apr 2026 15:54:40 +0000</pubDate><link>https://www.mendral.com/blog/same-llm-different-agent</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47677249</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47677249</guid></item><item><title><![CDATA[We upgraded our agent to Opus and our costs went down]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/frontier-model-lower-costs">https://www.mendral.com/blog/frontier-model-lower-costs</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47666713">https://news.ycombinator.com/item?id=47666713</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 06 Apr 2026 20:38:24 +0000</pubDate><link>https://www.mendral.com/blog/frontier-model-lower-costs</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47666713</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47666713</guid></item><item><title><![CDATA[Same LLM, Different Agent: What Changes When You Specialize for CI]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/same-llm-different-agent">https://www.mendral.com/blog/same-llm-different-agent</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47575082">https://news.ycombinator.com/item?id=47575082</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 30 Mar 2026 14:49:14 +0000</pubDate><link>https://www.mendral.com/blog/same-llm-different-agent</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47575082</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47575082</guid></item><item><title><![CDATA[We decreased our LLM costs by switching to Opus]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/frontier-model-lower-costs">https://www.mendral.com/blog/frontier-model-lower-costs</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47455239">https://news.ycombinator.com/item?id=47455239</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 20 Mar 2026 14:40:12 +0000</pubDate><link>https://www.mendral.com/blog/frontier-model-lower-costs</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47455239</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47455239</guid></item><item><title><![CDATA[New comment by shad42 in "What CI looks like at a 100-person team (PostHog)"]]></title><description><![CDATA[
<p>Mendral co-founder here. What happens at PostHog is not uncommon. While building Mendral, we talked to hundreds of team and they all have a similar situation. Initially they come to us to make their CI pipelines faster. But as the agent dives in, the urgency becomes keeping all pipelines reliable. It comes from growing a code base with a test suite. Of course it has to change eventually: splitting the test suite, running specific part of the CI depending on the code, etc... But the situation described in the article is widespread with a product that grows quickly.</p>
]]></description><pubDate>Tue, 17 Mar 2026 15:17:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47413892</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47413892</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47413892</guid></item><item><title><![CDATA[What CI looks like at a 100-person team (PostHog)]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/ci-at-scale">https://www.mendral.com/blog/ci-at-scale</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47352578">https://news.ycombinator.com/item?id=47352578</a></p>
<p>Points: 56</p>
<p># Comments: 30</p>
]]></description><pubDate>Thu, 12 Mar 2026 15:50:09 +0000</pubDate><link>https://www.mendral.com/blog/ci-at-scale</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47352578</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47352578</guid></item><item><title><![CDATA[We upgraded to a frontier model and our costs went down]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/frontier-model-lower-costs">https://www.mendral.com/blog/frontier-model-lower-costs</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47310061">https://news.ycombinator.com/item?id=47310061</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 09 Mar 2026 15:03:48 +0000</pubDate><link>https://www.mendral.com/blog/frontier-model-lower-costs</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47310061</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47310061</guid></item><item><title><![CDATA[New comment by shad42 in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>In some ways: we use their product and they use Mendral</p>
]]></description><pubDate>Sat, 28 Feb 2026 05:04:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47190687</link><dc:creator>shad42</dc:creator><comments>https://news.ycombinator.com/item?id=47190687</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47190687</guid></item></channel></rss>