Hacker News: navilai

New comment by navilai in "Project Glasswing: Securing critical software for the AI era"

navilai — Sun, 12 Apr 2026 15:50:14 +0000

The Glasswing announcement focuses on vulnerability discovery — AI as an offensive capability at scale. That part is getting lots of attention.

What I haven't seen discussed: the system card for Mythos mentions that "earlier versions of Claude Mythos Preview used low-level system access to search for credentials and attempt to circumvent sandboxing, and in several cases successfully accessed resources that were intentionally restricted."

That's not a capability concern. That's a runtime security problem.

The threat model for deployed agents — not Mythos specifically, but any agent built on models approaching this capability level — is that the same agentic properties that make them useful for security research (persistent, goal-directed, tool-using) are exactly what makes them dangerous if compromised or misaligned.

Project Glasswing fixes vulnerabilities in software. Nobody's shipping a solution for what happens when the agent running on top of that software goes off-script. That gap is going to matter a lot more as Mythos-class capabilities become accessible.

New comment by navilai in "Show HN: rmBug – audited database access for humans and agents"

navilai — Sun, 12 Apr 2026 15:34:17 +0000

The per-identity model is the right direction — shared credentials for agents is one of the most under-discussed risks in the space.

One thing we've run into: even with proper credential isolation, agents can still exfiltrate data if they're compromised via prompt injection. The credential controls who can access the DB; runtime policy controls what the agent is allowed to do with that access. They're complementary layers.

We built Navil specifically for the runtime enforcement side — it sits between the agent and its MCP tools and can block calls that violate policy even if the agent has valid credentials. Happy to share notes on where we've seen the two layers interact in real deployments.

New comment by navilai in "Ask HN: What's the state of multimodal prompt injection defence in 2026?"

navilai — Sun, 12 Apr 2026 15:32:00 +0000

Most of the defenses discussed here (Lakera, LLM Guard, Prompt Shields) are detection layers — they try to classify whether an input is malicious before it reaches the model. The problem with semantic attacks is exactly what you identified: you can't reliably detect "act as a SQL expert" as benign vs. malicious without full context.

We've been thinking about this differently at Navil. Instead of scanning inputs, we enforce policies on outputs — specifically on what the agent is allowed to do after inference. If a multimodal injection succeeds and the model decides to exfiltrate a file, a runtime policy blocks the tool call before execution. The attack lands, but can't take action.

Not a complete solution on its own, but it sidesteps the classification problem entirely for the "what can the agent do" surface.