<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: zambelli</title><link>https://news.ycombinator.com/user?id=zambelli</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 13 Jun 2026 15:26:15 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=zambelli" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>v0.7.1 is now out - Forge can now sit behind Claude Code! Proxy mode can talk to supported backends and handles format translation. Anthropic > forge > OpenAI; or Anthropic > forge > Anthropic.<p>Will get this ported over into vLLM work and try to get that released soon.<p>Thanks to some kind folks who contributed Docker, token counting, and a handful of PRs I haven't gotten to yet.</p>
]]></description><pubDate>Sun, 24 May 2026 09:33:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=48255882</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48255882</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48255882</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Thanks for the thoughtful comment! Let me try to unpack some of what's there and what's missing.<p>Forge is at its core a mechanical reliability layer, whereas a lot of memory/skill management would be more of an orchestration component/element that that consumer would own.<p>That split that has forge stopping at the mechanical layer was an intentional design decision, but there's no reason it couldn't grow into more. I think a lot of what you're thinking about is a big model/small model split similar to how CC does it - but that's an orchestrator.<p>Now, where Forge can help with what you're suggesting - I think most of it is there, but needs some wiring from the consumer/orchestrator:
- Forge surfaces information about which guardrails fired: InferenceResult.new_messages carries typed MessageMeta.type — RETRY_NUDGE, STEP_NUDGE, PREREQUISITE_NUDGE, CONTEXT_WARNING, SUMMARY. So every nudge that fired during a run is observable per-step. A consumer could capture that and compare to workflow steps to reconstruct what success looked like.
- Combined with Guardrails.check() > CheckResult, you would have a lot of the journey the model took to get to the answer.
- Forge lets you (actually, requires) you to define the system prompt, any workflow restrictions, and the tools. So if you know something about how your task will behave with a small model, you can include that in system prompt, or a tool that's a required step, etc.<p>For integrations into MCPs/etc that house memories and skills, those can be surfaced to the model with Forge in place. Prompt the model to search for tools in the MCP/surface an MCP tool, etc. I've built a consumer that follows this pattern: main agent gets task > main agent eyeballs whether it can be solved on its own > if not, sends to a subagent specialized on that topic (that has access to more tools related to that) - which allows me to keep context lean for each agent.<p>You could do something similar where the model is prompted to use its toolset, but if its unsure or needs a tool it doesn't have, to call the get_mcp() tool or something to look for better options.<p>Big model v small model now - a couple of ways I think about it.
- You could use big models to go through your workflow a few times, see common patterns, and then use those to define prerequisite and required steps in Forge guardrails when using small models.
- You could use small models the same way there's the ANTHROPIC_SMALL_FAST_MODEL env var in claude code (this is what Explore subagent uses I think). Big model is effectively an orchestrator, and when it recognizes a task is easy, it dispatches a small model to do it, where Forge might make it viable.<p>Hoepfully that helps! Forge could certainly elevate some of this to be more native - and I might do that - like a mode that packages up results for you so you don't need to reconstruct the nudge events from hooks firing. But everything should be there to integrate with a memory system with the information required, or with an API/MCP that has more tools or skills for the agent to read.<p>Would love to see the integration if you do it! You'd just need a consumer that captures the events forge returns and packages them up into whatever your memory system is looking for!<p>If you're looking for other ways of ingesting those memories/skills that isn't system prompt, message, or tool result, then that's something I can look into.</p>
]]></description><pubDate>Sun, 24 May 2026 00:59:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=48253230</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48253230</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48253230</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>You're very welcome! I've seen some good PRs come through and merges are starting. I need 24-48 hours to get the conference demo and travel sorted then work can continue at a faster pace! No intention of dropping forge, plenty more to do especially around the proxy.</p>
]]></description><pubDate>Sat, 23 May 2026 01:55:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=48243778</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48243778</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48243778</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Thanks to everyone for the great discussion! v0.7.0 is out now. It was in flight when this landed - changes tool error channel based on dogfooding observations with some larger models, eval re-run (numbers shift but within CI), and most importantly docs updated! I hope they're clearer now.</p>
]]></description><pubDate>Fri, 22 May 2026 14:59:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48236873</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48236873</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48236873</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>This is a really neat writeup, and the empirical data for coding agents is super useful. Will take a closer read and see if there's anything I easily lift into my harness!</p>
]]></description><pubDate>Thu, 21 May 2026 18:49:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=48227317</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48227317</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48227317</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Merged! Thanks for that catch. I'll try to sequence the in-flight work ASAP to get the vllm branch merged in as a whole.</p>
]]></description><pubDate>Thu, 21 May 2026 04:13:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=48217789</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48217789</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48217789</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Oh that's a good find, I'll book ark this for a GitHub issue.<p>Glad to hear it's working!</p>
]]></description><pubDate>Wed, 20 May 2026 22:05:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=48214864</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48214864</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48214864</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Yeah I got it working as a quick test run to confirm a model issue vs backend issue on a consumer app. It worked on my dual-5070 Ti rig, but I didn't have time to formalize all the way and merge it in. Thanks for linking it!</p>
]]></description><pubDate>Wed, 20 May 2026 20:16:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48213481</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48213481</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48213481</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Nice symmetry with tool call failures being sent to LLM that made the call without bugging the user. The artifact-generating entity gets the error back, effectively.<p>100% correct, and stackable. Could have topic refusal in LLM training itself, forge in tool call alter, and sdlc gates at the workflow level.</p>
]]></description><pubDate>Wed, 20 May 2026 19:04:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=48212477</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48212477</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48212477</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Oh, interesting - thanks for the link. I really haven't explored this but it should slot in fairly easily I think? Gotta dig into it more.</p>
]]></description><pubDate>Wed, 20 May 2026 17:59:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=48211584</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48211584</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48211584</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Very cool! I'll try to get an issue open on lmstudio support and add it to the backlog.</p>
]]></description><pubDate>Wed, 20 May 2026 17:58:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=48211564</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48211564</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48211564</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Interesting, catching the problem upstream, effectively. How did you enforce the grammar?</p>
]]></description><pubDate>Wed, 20 May 2026 16:55:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=48210642</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48210642</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48210642</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Retry nudges <i>do</i> generate an extra LLM call, and those average extra calls time impacts are captured in the eval data.<p>But that's the difference between the call failing and succeeding (eventually).<p>On successful calls the presence of forge should be unnoticeable.</p>
]]></description><pubDate>Wed, 20 May 2026 15:57:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48209846</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48209846</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48209846</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Oh, awesome! I'll take a look.</p>
]]></description><pubDate>Wed, 20 May 2026 15:35:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=48209468</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48209468</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48209468</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Thank you! I've been trying to catch those replies and redirect people, but hopefully your comment be upvoted for others. Very embarrassing to put up the post with the wrong link lol.</p>
]]></description><pubDate>Wed, 20 May 2026 14:50:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=48208821</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48208821</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48208821</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Ohhhh, that's much more interesting. I haven't looked into that at all, but now I'm curious. I'd need to think way more about how to layer that into forge, but the principle could likely be applied somewhere. I get it now.</p>
]]></description><pubDate>Wed, 20 May 2026 13:28:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207388</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48207388</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207388</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>This is not an agentic coding harness. It's a generic tool-calling guardrail stack. I <i>have</i> built a coding harness built on Forge since, but that's not what this is.</p>
]]></description><pubDate>Wed, 20 May 2026 13:26:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207367</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48207367</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207367</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Nice ;). I'll take a closer read of it, that's on me - I am definitely seeing more people looking in this direction as agents start to ramp in production at the enterprise level, which I suspect is highlighting some of these failure modes at higher stakes. And also the cloud frontier API bills.</p>
]]></description><pubDate>Wed, 20 May 2026 13:25:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207350</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48207350</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207350</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>I know :( - I posted the wrong link and now it's there forever.<p>Dashboard is in here: <a href="https://github.com/antoinezambelli/forge/tree/main/docs/results" rel="nofollow">https://github.com/antoinezambelli/forge/tree/main/docs/resu...</a></p>
]]></description><pubDate>Wed, 20 May 2026 13:24:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207327</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48207327</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207327</guid></item><item><title><![CDATA[New comment by zambelli in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"]]></title><description><![CDATA[
<p>Not stupid at all!<p>Some of the older models did do this (like 3.5-era ish I think), and the harness <i>would</i> parse the results.<p>The newer way frontier has setup is structured tool calls. `tool_use` or `tool_calls`. The response is then received as a different tool_result rather than a regular message. That's a bit of the newer way of doing it.<p>The failure mode in question is more the model mixing the two: "Sure, I'll read the file: {"tool": "read", "args": {"path": "foo"}}" - that'll break stuff. Other failure modes are the json not parsing when sent it as a structured call, and in some cases the model <i>just</i> emitting text and forgetting the tool call.</p>
]]></description><pubDate>Wed, 20 May 2026 13:23:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207313</link><dc:creator>zambelli</dc:creator><comments>https://news.ycombinator.com/item?id=48207313</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207313</guid></item></channel></rss>