<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: aluzzardi</title><link>https://news.ycombinator.com/user?id=aluzzardi</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 07 May 2026 08:29:11 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=aluzzardi" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by aluzzardi in "The agent harness belongs outside the sandbox"]]></title><description><![CDATA[
<p>Thank you, appreciate it!<p>Regarding scoping: In our case, the agent loop runs in the same way as our API server does (as in, it’s a multi tenant service running in a container somewhere). And we solve scoping in the same way.<p>To put it in other words, whether it’s the API receiving “GET /memories/id” or the LLM requesting “Read(/memories/id)” we do pretty much the same thing (check authN/authZ, scope the db request, etc).<p>Basically the LLM is just another API client using a slightly different format for inputs and outputs, but sharing the same permission layer.</p>
]]></description><pubDate>Sun, 03 May 2026 07:28:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=47994351</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47994351</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47994351</guid></item><item><title><![CDATA[New comment by aluzzardi in "The agent harness belongs outside the sandbox"]]></title><description><![CDATA[
<p>Author here.<p>I should have made it more clear that the article is about agent / harness building (not about running third party agents).<p>> I barely trust the harness more than the LLM<p>Since we built it, I trust it just as much as I trust our API server :)<p>The latter gets untrusted inputs from the internet, while the former gets untrusted inputs from the LLM</p>
]]></description><pubDate>Sun, 03 May 2026 01:05:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47992217</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47992217</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47992217</guid></item><item><title><![CDATA[New comment by aluzzardi in "The agent harness belongs outside the sandbox"]]></title><description><![CDATA[
<p>Author here.<p>I’m worried about the same (models tuned for specific harnesses).<p>We actually work around that by respecting the “contract”. For instance, our harness’ Bash signature is exactly the same as Claude’s. We do our sandboxing stuff and respond using the same format.<p>In the “eyes” of the model there’s no difference between what Claude does and what we do (even though the implementation is completely different).<p>We basically use Claude’s tools as API contract</p>
]]></description><pubDate>Sun, 03 May 2026 00:57:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47992169</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47992169</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47992169</guid></item><item><title><![CDATA[New comment by aluzzardi in "The agent harness belongs outside the sandbox"]]></title><description><![CDATA[
<p>Author here.<p>This is an interesting and novel field, so I’m not pretending I know the answers, but this is what worked for us :)<p>At the end of the day, and oversimplifying things: why would I want to spawn a for loop that calls an API (LLM) into its own dedicated sandbox/computer?<p>When the model wants to run a command, it’ll tell you so. Doesn’t need to be a local exec, you can run it anywhere, the model won’t know the difference.<p>The agent loop itself doesn’t need sandboxing. In many cases, most tool calls don’t require sandboxing either. For the tools that do require a computer, you can route those requests there when needed, rather than running the whole software in that sandbox.<p>To me running the agent loop in the sandbox itself feels like “you should run your API in your DB container because it’ll talk to it at some point”.</p>
]]></description><pubDate>Sun, 03 May 2026 00:52:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47992136</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47992136</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47992136</guid></item><item><title><![CDATA[New comment by aluzzardi in "The agent harness belongs outside the sandbox"]]></title><description><![CDATA[
<p>Author here.<p>I think the confusion is that “agent” is used for two very different things:<p>- building an agent<p>- an “agent” product/runtime (Claude Code, etc)<p>In the first case, the model never executes anything. It just outputs something like “call this API”. Your code is the one doing it, with whatever validation you want. There’s no need for a sandbox there because there’s no arbitrary execution.</p>
]]></description><pubDate>Sun, 03 May 2026 00:16:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47991891</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47991891</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47991891</guid></item><item><title><![CDATA[New comment by aluzzardi in "The agent harness belongs outside the sandbox"]]></title><description><![CDATA[
<p>Author here.<p>In my opinion, the main driver here is how fast models have evolved in the past 12 months. It makes the architecture of everything around them obsolete, very fast.<p>We went from using models as a building block, wrapping them in heavy workflow code, to now models being smart enough to drive their own workflows and planning.</p>
]]></description><pubDate>Sat, 02 May 2026 23:47:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=47991730</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47991730</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47991730</guid></item><item><title><![CDATA[New comment by aluzzardi in "The agent harness belongs outside the sandbox"]]></title><description><![CDATA[
<p>Author here. Because of parallelism and non determinism.<p>This problem is quite common and not limited to memories. For instance, Claude Code will block write attempts and steer the agent to perform a read first (because the file might have been modified in the meantime by the user or another agent).<p>Same principle here: rather than trying to deterministically “merge” concurrent writes, you fail the last write and let the agent read again and try another write</p>
]]></description><pubDate>Sat, 02 May 2026 22:40:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47991312</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47991312</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47991312</guid></item><item><title><![CDATA[New comment by aluzzardi in "The agent harness belongs outside the sandbox"]]></title><description><![CDATA[
<p>Author here. My definition is: you take an agent, remove the model and you’re left with the harness.<p>Tools, memories, sandboxing, steering, etc</p>
]]></description><pubDate>Sat, 02 May 2026 22:32:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47991256</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47991256</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47991256</guid></item><item><title><![CDATA[New comment by aluzzardi in "The agent harness belongs outside the sandbox"]]></title><description><![CDATA[
<p>Author here. Depending on how it’s designed, the harness itself doesn’t need any sandboxing.<p>At the end of the day, it’s a “simple” loop that calls an external API (LLM) and receives requests to execute stuff on its behalf.<p>It’s not the agent running bash commands: you (the harness author) are, and you’re in full control of where and how those commands get executed.<p>In the article’s case, bash commands are forwarded to a sandbox, nothing ever runs on the harness itself (it physically can’t, local execution is not even implemented in the harness).</p>
]]></description><pubDate>Sat, 02 May 2026 22:19:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47991161</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47991161</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47991161</guid></item><item><title><![CDATA[We Built Our AI Agent]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/how-we-built-our-ai-agent">https://www.mendral.com/blog/how-we-built-our-ai-agent</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47352272">https://news.ycombinator.com/item?id=47352272</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 12 Mar 2026 15:33:13 +0000</pubDate><link>https://www.mendral.com/blog/how-we-built-our-ai-agent</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47352272</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47352272</guid></item><item><title><![CDATA[Our Agent's Most Important Job Is Deciding Not to Think]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.mendral.com/blog/agent-orchestration-model-hierarchy">https://www.mendral.com/blog/agent-orchestration-model-hierarchy</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47276519">https://news.ycombinator.com/item?id=47276519</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 06 Mar 2026 15:52:01 +0000</pubDate><link>https://www.mendral.com/blog/agent-orchestration-model-hierarchy</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47276519</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47276519</guid></item><item><title><![CDATA[New comment by aluzzardi in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>It started with Sonnet 4.0 as a single agent and now it’s a mix of Opus 4.6 and Haiku 4.5 agents.<p>Opus plans the investigation and orchestrates the searches.<p>Haiku is the one actually querying ClickHouse and returning relevant bits</p>
]]></description><pubDate>Sat, 28 Feb 2026 04:25:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47190408</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47190408</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47190408</guid></item><item><title><![CDATA[New comment by aluzzardi in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>> it's not magic and you need to make the job of the agent easier by giving it good instructions, tools, and environments.<p>This. We had much better success by letting the agent pull context rather trying to push what we thought was relevant.<p>Turns out it's exactly like a human: if you push the wrong context, it'll influence them to follow the wrong pattern.</p>
]]></description><pubDate>Fri, 27 Feb 2026 23:07:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47187196</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47187196</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47187196</guid></item><item><title><![CDATA[New comment by aluzzardi in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>There are 2 layers of compression:<p>- ZSTD (actual data compression)<p>- De-duplication (i.e. what you're saying)<p>Although AFAIK it's not "just point to it" but rather storing sorted data and being able to say "the next 2M rows have the same PR Title"</p>
]]></description><pubDate>Fri, 27 Feb 2026 19:33:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=47184537</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47184537</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47184537</guid></item><item><title><![CDATA[New comment by aluzzardi in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>Mendral co-founder and post author here.<p>I agree with your statement and explained in a few other comments how we're doing this.<p>tldr:<p>- Something happens that needs investigating<p>- Main (Opus) agent makes focused plan and spawns sub agents (Haiku)<p>- They use ClickHouse queries to grab only relevant pieces of logs and return summaries/patterns<p>This is what you would do manually: you're not going to read through 10 TB of logs when something happens; you make a plan, open a few tabs and start doing narrow, focused searches.</p>
]]></description><pubDate>Fri, 27 Feb 2026 18:48:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47184029</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47184029</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47184029</guid></item><item><title><![CDATA[New comment by aluzzardi in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>From our experience running this, we're seeing patterns like these:<p>- Opus agent wakes up when we detect an incident (e.g. CI broke on main)<p>- It looks at the big picture (e.g. which job broke) and makes a plan to investigate<p>- It dispatches narrowly focused tasks to Haiku sub agents (e.g. "extract the failing log patterns from commit XXX on job YYY ...")<p>- Sub agents use the equivalent of "tail", "grep", etc (using SQL) on a very narrow sub-set of logs (as directed by Opus) and return only relevant data (so they can interpret INFO logs as actually being the problem)<p>- Parent Opus agent correlates between sub agents. Can decide to spawn more sub agents to continue the investigation<p>It's no different than what I would do as a human, really. If there are terabytes of logs, I'm not going to read all of them: I'll make a plan, open a bunch of tabs and surface interesting bits.</p>
]]></description><pubDate>Fri, 27 Feb 2026 18:26:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47183745</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47183745</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47183745</guid></item><item><title><![CDATA[New comment by aluzzardi in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>> My experience with LLM generated SQL in OLTP and OLAP platforms has been a mixed bag<p>Models are evolving <i>fast</i>. If your experience is older than a few months, I encourage you to try again.<p>I mean this with the best intentions: it's seriously mind boggling. We started doing this with Sonnet 4.0 and the relevance was okay at best. Then in September we shifted to Sonnet 4.5 and it's been night and day.<p>Every single model released since then (Opus 4.5, 4.6) has meaningfully improved the quality of results</p>
]]></description><pubDate>Fri, 27 Feb 2026 18:11:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47183550</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47183550</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47183550</guid></item><item><title><![CDATA[New comment by aluzzardi in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>We've actually started to gather metrics this week to write that exact post :) Coming soon!</p>
]]></description><pubDate>Fri, 27 Feb 2026 18:01:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47183441</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47183441</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47183441</guid></item><item><title><![CDATA[New comment by aluzzardi in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>Mendral co-founder here and author of the post.<p>This is an interesting approach. I definitely agree with the problem statement: if the LLM has to filter by error/fatal because of context window constraints, it will miss crucial information.<p>We took a different approach: we have a main agent (opus 4.6) dispatching "log research" jobs to sub agents (haiku 4.5 which is fast/cheap). The sub agent reads a whole bunch of logs and returns only the relevant parts to the parent agent.<p>This is exactly how coding agents (e.g. Claude Code) do it as well. Except instead of having sub agents use grep/read/tail, they use plain SQL.</p>
]]></description><pubDate>Fri, 27 Feb 2026 17:55:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47183345</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47183345</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47183345</guid></item><item><title><![CDATA[New comment by aluzzardi in "We gave terabytes of CI logs to an LLM"]]></title><description><![CDATA[
<p>Post author here.<p>Yes, it works really well.<p>1) The latest models are radically better at this. We noticed a massive improvement in quality starting with Sonnet 4.5<p>2) The context issue is real. We solve this by using sub agents that read through logs and return only relevant bits to the parent agent’s context</p>
]]></description><pubDate>Fri, 27 Feb 2026 16:36:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47182522</link><dc:creator>aluzzardi</dc:creator><comments>https://news.ycombinator.com/item?id=47182522</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47182522</guid></item></channel></rss>