Hacker News: Oxlamarr

New comment by Oxlamarr in "Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview"

Oxlamarr — Tue, 28 Apr 2026 11:00:45 +0000

The harness point is probably the most important part here. With terminal agents, it often feels like the benchmark is measuring the whole loop: model, prompt, tool interface, retry policy, timeout handling, and recovery from bad shell commands.

Do you have a sense of which part contributed most to the jump?

New comment by Oxlamarr in "Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite"

Oxlamarr — Fri, 24 Apr 2026 10:00:36 +0000

Very cool. Is the bottleneck under load mostly SQLite write throughput, or the WAL notification layer?

New comment by Oxlamarr in "DeepSeek v4"

Oxlamarr — Fri, 24 Apr 2026 09:57:19 +0000

The speed of progress here is wild. It feels like the hard part is shifting from having access to a strong model to actually building trustworthy systems around it.

New comment by Oxlamarr in "Critical RCE Vulnerability in LiteLLM Proxy"

Oxlamarr — Wed, 22 Apr 2026 14:37:28 +0000

This is exactly why we can't just wrap APIs around LLMs and assume it's secure. The execution layer needs to be completely decoupled from the generation layer.

When your proxy or agent framework inevitably gets compromised (like this RCE), the blast radius is everything it has access to. We desperately need strict, fail-closed policy engines sitting between the AI infrastructure and the actual consequence/execution APIs. If the execution layer requires cryptographic proof (like mTLS or DPoP) for every single action, an RCE in the LLM proxy doesn't automatically mean a compromised database or stolen funds.

New comment by Oxlamarr in "Show HN: An MCP server that fact-checks AI bug diagnoses against AST evidence"

Oxlamarr — Wed, 22 Apr 2026 14:27:27 +0000

Love this architectural approach. Using probabilistic models to verify other probabilistic models is just turtles all the way down, so anchoring the agent to deterministic AST evidence is exactly the right move.

I've been working on the exact same philosophical problem, but at the production execution layer rather than the dev tooling layer. I built a zero-trust policy engine that sits right before an AI agent triggers a real-world consequence (like a financial transaction or DB write), requiring deterministic, cryptographically verifiable proof before allowing the execution.

It’s incredibly refreshing to see this strict, "fail-closed" deterministic fact-checking mindset being applied to the debugging phase too. Awesome work on the implementation!