New comment by Damianf19 in "Launch HN: Voker (YC S24) – Analytics for AI Agents"

Damianf19 — Tue, 12 May 2026 16:45:38 +0000

What's the data model that lets you compare agents that differ a lot in tools/policies? Curious if you normalize on the "what did the user actually accomplish" layer or on raw token/turn metrics, because the two paint completely different pictures of "is this agent working." We struggle with this on the eval side of our own product (email pipeline outcomes, not agents, but same shape).

Hacker News: Damianf19

New comment by Damianf19 in "Launch HN: Voker (YC S24) – Analytics for AI Agents"