<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: Oxlamarr</title><link>https://news.ycombinator.com/user?id=Oxlamarr</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 21 May 2026 17:22:05 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=Oxlamarr" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by Oxlamarr in "Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview"]]></title><description><![CDATA[
<p>The harness point is probably the most important part here. With terminal agents, it often feels like the benchmark is measuring the whole loop: model, prompt, tool interface, retry policy, timeout handling, and recovery from bad shell commands.<p>Do you have a sense of which part contributed most to the jump?</p>
]]></description><pubDate>Tue, 28 Apr 2026 11:00:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47932770</link><dc:creator>Oxlamarr</dc:creator><comments>https://news.ycombinator.com/item?id=47932770</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47932770</guid></item><item><title><![CDATA[New comment by Oxlamarr in "Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite"]]></title><description><![CDATA[
<p>Very cool. Is the bottleneck under load mostly SQLite write throughput, or the WAL notification layer?</p>
]]></description><pubDate>Fri, 24 Apr 2026 10:00:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=47888014</link><dc:creator>Oxlamarr</dc:creator><comments>https://news.ycombinator.com/item?id=47888014</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47888014</guid></item><item><title><![CDATA[New comment by Oxlamarr in "DeepSeek v4"]]></title><description><![CDATA[
<p>The speed of progress here is wild. It feels like the hard part is shifting from having access to a strong model to actually building trustworthy systems around it.</p>
]]></description><pubDate>Fri, 24 Apr 2026 09:57:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47887978</link><dc:creator>Oxlamarr</dc:creator><comments>https://news.ycombinator.com/item?id=47887978</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47887978</guid></item><item><title><![CDATA[New comment by Oxlamarr in "Critical RCE Vulnerability in LiteLLM Proxy"]]></title><description><![CDATA[
<p>This is exactly why we can't just wrap APIs around LLMs and assume it's secure. The execution layer needs to be completely decoupled from the generation layer.<p>When your proxy or agent framework inevitably gets compromised (like this RCE), the blast radius is everything it has access to. We desperately need strict, fail-closed policy engines sitting between the AI infrastructure and the actual consequence/execution APIs. If the execution layer requires cryptographic proof (like mTLS or DPoP) for every single action, an RCE in the LLM proxy doesn't automatically mean a compromised database or stolen funds.</p>
]]></description><pubDate>Wed, 22 Apr 2026 14:37:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47864282</link><dc:creator>Oxlamarr</dc:creator><comments>https://news.ycombinator.com/item?id=47864282</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47864282</guid></item><item><title><![CDATA[New comment by Oxlamarr in "Show HN: An MCP server that fact-checks AI bug diagnoses against AST evidence"]]></title><description><![CDATA[
<p>Love this architectural approach. Using probabilistic models to verify other probabilistic models is just turtles all the way down, so anchoring the agent to deterministic AST evidence is exactly the right move.<p>I've been working on the exact same philosophical problem, but at the production execution layer rather than the dev tooling layer. I built a zero-trust policy engine that sits right before an AI agent triggers a real-world consequence (like a financial transaction or DB write), requiring deterministic, cryptographically verifiable proof before allowing the execution.<p>It’s incredibly refreshing to see this strict, "fail-closed" deterministic fact-checking mindset being applied to the debugging phase too. Awesome work on the implementation!</p>
]]></description><pubDate>Wed, 22 Apr 2026 14:27:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47864149</link><dc:creator>Oxlamarr</dc:creator><comments>https://news.ycombinator.com/item?id=47864149</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47864149</guid></item></channel></rss>