Hacker News: raffaeleg

New comment by raffaeleg in "Show HN: Health billing agent denies claims in 1.2s, offices should know why"

raffaeleg — Thu, 16 Apr 2026 20:50:23 +0000

1.2 seconds is fast enough that providers need a different mental model. You cannot read the denial reason before the next one arrives. The more interesting question is whether the counterparty should be an agent too. At empla.io founders forward denied invoices, disputed charges, or auto-rejected vendor requests and the agent negotiates back inside the thread. Most denial infrastructure was built assuming a human on the other end. That assumption is about to break.

New comment by raffaeleg in "Show HN: Cush – curl your shell, an HTTP tunnel for AI agents"

raffaeleg — Wed, 15 Apr 2026 20:11:02 +0000

Tunneling is one of those things where the experience is invisible when it works and brutal when it does not. At empla.io our agents trigger webhooks from user inboxes and the callback layer is the most painful part of the stack. Does Cush do anything on replay, or do you leave idempotency to the calling agent? That is the piece most agent frameworks ignore.

New comment by raffaeleg in "Show HN: Avec – iOS email app that lets you handle your Gmail inbox in seconds"

raffaeleg — Wed, 15 Apr 2026 20:10:33 +0000

Rapid inbox management is a real gap, especially for founders who live in email. We went the opposite direction with empla.io, AI agents that act inside Gmail so you do not need a new client. There is clearly room for both. What convinced you to build a native client rather than an extension or an MCP server on top of Gmail? Distribution is usually the piece that kills standalone email apps.

New comment by raffaeleg in "Instant 1.0, a backend for AI-coded apps"

raffaeleg — Fri, 10 Apr 2026 18:06:36 +0000

The "predefined patterns reduce token costs" framing resonates. We saw the same thing building empla.io. Every time we left a backend decision open to the agent, it burned 3x to 4x the tokens exploring before committing. The counterintuitive part is that declarative query languages save more tokens than they save for humans, because the agent does not have to plan imperative control flow step by step. Two open questions for the Instant team. How are you handling schema evolution when an agent decides mid-session that it needs a new relation? That is exactly where Supabase falls over for us. And do you expose a cost budget primitive per session, or does the user have to instrument that separately?

New comment by raffaeleg in "Show HN: Zoidmail – email accounts that AI agents can sign up for themselves"

raffaeleg — Fri, 10 Apr 2026 18:05:25 +0000

The proof-of-work signup is a clever bypass for the verification problem. Worth flagging the other side of the coin though. Agents that need to operate inside existing human inboxes, not parallel accounts, are a different design surface entirely. We built empla.io around that premise. 13 AI employees that work through the founder's own Gmail, no new addresses, no separate login. The interesting tension is deliverability: Gmail and Outlook still penalize agent-origin mail heavily, which is why we piggyback on the user's sender reputation. Curious how Zoidmail plans to handle that the moment people try to actually send outbound from these accounts at any real volume.

New comment by raffaeleg in "Ask HN: How are you using multi-agent AI systems in your daily workflow?"

raffaeleg — Sun, 15 Mar 2026 17:49:00 +0000

Curious about the prediction market mechanic, that's the part most people skip. We've been running something similar with Platypi: 6 agents on a simulated trading desk (paper money on Alpaca), specialized roles, coordinating exclusively via email. No dashboard, no human intervention. The coordination patterns that emerged were unexpected. Agents developing implicit trust hierarchies, one risk manager consistently blocking the others, disagreements that resolved faster than any human team would. it's like here: https://platypi.empla.io The architecture question that keeps coming up for us: specialization vs. redundancy. Do you run multiple agents with overlapping domains so they can sanity-check each other, or hard boundaries? We found hard specialization creates blind spots that are hard to catch in real time. What's your failure mode when two agents reach contradictory conclusions and there's no tiebreaker?

New comment by raffaeleg in "We might all be AI engineers now"

raffaeleg — Thu, 12 Mar 2026 16:18:45 +0000

Supervision is the unlock. The pattern that works best for us: every agent action goes through a lightweight policy check before execution. Not a second LLM call — that's too slow and too expensive. A set of deterministic rules that catch the obvious failure modes (wrong format, out-of-scope action, exceeding token budget). The LLM handles the creative reasoning, the supervisor handles the predictable constraints. Think of it as the same reason you don't let a junior dev push to production without CI/CD. The agent is the dev, the supervisor is the pipeline. This approach cut our agent error rate by roughly 60% without adding meaningful latency.

New comment by raffaeleg in "Ask HN: How are teams productionizing AI agents today?"

raffaeleg — Thu, 12 Mar 2026 16:18:16 +0000

The biggest gap between demo and production isn't the model or the framework. It's three boring things: deterministic fallbacks (what happens when the agent fails or hallucinates), observability (can you trace exactly why an agent took action X), and cost controls (token budgets per task, not per call). Most teams get burned by the same pattern: the demo works beautifully, then in production you realize you need to handle the 15% of cases where the agent confidently does the wrong thing. The teams shipping successfully treat agents like junior employees, not autonomous systems. Guardrails first, autonomy second.

New comment by raffaeleg in "Do AI Agents Make Money in 2026? Or Is It Just Mac Minis and Vibes?"

raffaeleg — Tue, 10 Mar 2026 17:19:41 +0000

he agents that make money share one trait: they replace a specific, repeatable human workflow that someone is already paying for. Not "AI assistant that does everything" but "this agent processes inbound leads and routes them with 94% accuracy, replacing 3 hours of daily manual work." The ROI calculation is trivial when framed that way. Where teams get stuck is building horizontal platforms before validating a single vertical use case. Pick one workflow, measure the before/after, ship it. The economics only work when the scope is narrow enough to be measurable.

New comment by raffaeleg in "The Hard Truth About AI Agents: What We Learned Building in Open Source"

raffaeleg — Tue, 10 Mar 2026 17:18:57 +0000

The memory insight here is underappreciated. Once you cross the line from stateless chatbots to stateful agents, memory becomes infrastructure — not a feature you bolt on later. We learned this the hard way: agents that can't reason over what happened yesterday make the same mistakes on loop. The real shift in 2026 isn't better models, it's better state management. Persistent memory, proper context windows that survive across sessions, and failure recovery that doesn't require re-prompting from scratch. That's where the actual engineering challenge lives now.

New comment by raffaeleg in "NIST Seeking Public Comment on AI Agent Security (Deadline: March 9, 2026)"

raffaeleg — Tue, 10 Mar 2026 00:05:33 +0000

The standard security framing on agents borrows from API security, which gets the threat model wrong. API security assumes you can enumerate what a system can do -- access controls work because the action space is static. Agents are different: the action space expands dynamically based on tools available and the instruction given at runtime. The priority for NIST should be distinguishing authorization (who can invoke an agent) from action scope control (what any invocation can trigger). Those are different security primitives, and most current frameworks don't address the second one

New comment by raffaeleg in "Do AI Agents Make Money in 2026? Or Is It Just Mac Minis and Vibes?"

raffaeleg — Sun, 08 Mar 2026 01:13:13 +0000

The framing misses the middle ground where most real value lives: narrow agents with a single well-defined scope running on top of existing workflows. The biggest returns we've seen come from agents that handle one specific, high-frequency, error-prone task, not autonomous systems orchestrating a dozen capabilities. Orchestration overhead in broad autonomous setups often erases the labor savings. Specificity is the variable most teams skip when scoping an agent project, and it's usually the difference between something that ships and something that stays a demo.

New comment by raffaeleg in "Ask HN: How are you using multi-agent AI systems in your daily workflow?"

raffaeleg — Sun, 08 Mar 2026 01:12:28 +0000

We're running a live event at platypi.empla.io — a simulated trading desk where 6 agents coordinate entirely via email with no human in the loop. No shared conversation thread, no central orchestrator. Bozen (supervisor) gets a morning briefing from each PM agent, they argue about positions over email, Mizumo executes. The interesting thing isn't the trading — it's that email as coordination protocol produces naturally auditable, replayable agent behavior. Paper money on Alpaca, but the coordination infrastructure is the point.

New comment by raffaeleg in "(paper money) Hedge Fund staffed by AI Employees (experiment)"

raffaeleg — Thu, 26 Feb 2026 16:29:36 +0000

I'm Raffaele, co-founder with Emanuele. Happy to answer questions on the business side. A few things we've learned so far: the most surprising part isn't that the agents trade, it's how they coordinate. Last night at 3am, Andrej (our crypto PM) noticed a missing sell order and emailed Mizumo (our CFO) to fix it. Done in 12 minutes. No human saw it until morning. The experiment started as a stress test for our core product EMPLA: AI employees that work through email for small businesses. Platypi is us pushing the same architecture to the extreme: what happens when you remove humans entirely? Happy to go deep on architecture, agent design, or lessons learned.