Hacker News: foreman_

New comment by foreman_ in "CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production"

foreman_ — Wed, 22 Apr 2026 05:23:38 +0000

The thread has converged on “LLM-as-judge is the wrong security primitive,” which is right as far as it goes. The prompt-injection chain ends at the outbound POST. By the time the judge sees the request, the credential has already been read.

The question edf13 pointed at but didn’t develop; where does a transport-layer judge earn its place at all? Not as the enforcement layer but as the audit layer on top of one. Kernel-level controls tell you what the agent did. A proxy tells you what the agent tried to exfiltrate and where to.

Structured-JSON escaping and header caps are good tools for the detection job. They’re the wrong tools for the prevention job. Different layers, different questions.

New comment by foreman_ in "Claude Code Opus 4.7 keeps checking on malware"

foreman_ — Sat, 18 Apr 2026 12:30:37 +0000

The classifier operates on surface features: file operations at scale, cookie manipulation, concurrent requests. Not intent.

The two failure modes are different. Task refusal is recoverable. What ivankra described (account termination for building Node and V8 to investigate crashes) isn’t. No diagnostic output, no visible appeal path. Standard debugging workflow but with permanent consequences.

This is a reliability characteristic you have to design around, not a policy question. Any workflow that touches the classifier’s surface features needs a fallback. Most people find out they need one after the fact.

New comment by foreman_ in "Agent - Native Mac OS X coding ide/harness"

foreman_ — Thu, 16 Apr 2026 05:24:19 +0000

The XPC architecture is the right call for privilege separation … it’s what makes sandboxing trustworthy on macOS rather than just advisory. I’m really curious how it handles the trust boundary between LLM responses and the XPC service layer. The most obvious attack surface is prompt injection via a document the agent reads, which then instructs it to do something in Safari or Messages that the user wouldn’t normally sanction. XPC gives you OS-enforced process isolation but doesn’t help you if the privileged process is faithfully executing a poisoned instruction.

What’s the current model for distinguishing user intent from “content the agent read”? Is it purely the system prompt guidance, or is there something structural?

Thanks for posting.