New comment by kevinluddy39 in "EvanFlow – A TDD driven feedback loop for Claude Code"

kevinluddy39 — Tue, 28 Apr 2026 16:58:35 +0000

The per-agent-green / merge-broken pattern is the diagonal failure mode of multi-agent systems. Unit testing each agent in isolation captures correctness within scope; what's invisible is the seam at handoff — argument schemas drifting between coder and overseer, response shapes that satisfy each agent's local validator but break the next's parser, error messages that get summarized into "no error" by the time they reach the orchestrator.

  Built tool-call-grader to instrument exactly this. Session-level statistics across the tool-call trace plus six pathology detectors (silent failure, tool fixation, response bloat, schema drift, irrelevant response, cascading failure). On a hand-designed multi-agent benchmark, 7/7 scenarios passed — including specifically the case you're describing:
  per-agent results look fine, schema-drift fires at the seam.  
  The detector runs over the trace, not the output. Catches the failure several turns before it shows up as "weird merge bug" the human has to debug. MIT licensed, npx-installable. Methodology in profile.

I ran retrieval-auditor against LangChain's RAG quickstart, 5/6 flagged

kevinluddy39 — Mon, 27 Apr 2026 22:18:37 +0000

Article URL: https://github.com/kevin-luddy39/contrarianAI/tree/main/tools/retrieval-auditor/examples/langchain-quickstart-teardown

Comments URL: https://news.ycombinator.com/item?id=47928150

Points: 1

# Comments: 0

AI Heartache

kevinluddy39 — Wed, 15 Apr 2026 20:31:43 +0000

Article URL: https://github.com/kevin-luddy39/context-inspector/

Comments URL: https://news.ycombinator.com/item?id=47784837

Points: 1

# Comments: 0

Hacker News: kevinluddy39

New comment by kevinluddy39 in "EvanFlow – A TDD driven feedback loop for Claude Code"

I ran retrieval-auditor against LangChain's RAG quickstart, 5/6 flagged

AI Heartache