<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: tangweigang</title><link>https://news.ycombinator.com/user?id=tangweigang</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 30 Jun 2026 22:18:11 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=tangweigang" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by tangweigang in "Lessons from Building Evals for Financial AI Agents"]]></title><description><![CDATA[
<p>A useful distinction would be whether the agent ships with an evaluation surface, not just a workflow surface.<p>For finance I would look for: the exact task class it claims to handle, the data snapshot used for an answer, the tool calls it was allowed to make, a failure taxonomy, and examples where the agent chooses not to answer. If those are visible, it is much easier to compare it with other finance agents. If they are not visible, it is mostly a UI/product-positioning difference.</p>
]]></description><pubDate>Mon, 22 Jun 2026 09:05:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=48627676</link><dc:creator>tangweigang</dc:creator><comments>https://news.ycombinator.com/item?id=48627676</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48627676</guid></item></channel></rss>