<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: galsapir</title><link>https://news.ycombinator.com/user?id=galsapir</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 12 Apr 2026 07:40:17 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=galsapir" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by galsapir in "Borges' cartographers and the tacit skill of reading LM output"]]></title><description><![CDATA[
<p>Hey thanks! I do wonder that. I think that even if specifically for code smell the things would be subtler, for other forms of AI driven averageness (especially in areas where we can't RLVR the models to perfection) it might still be present. But yeah I wonder how those thoughts will age (and how we'll update our priors accordingly).</p>
]]></description><pubDate>Sun, 12 Apr 2026 05:41:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=47736449</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47736449</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47736449</guid></item><item><title><![CDATA[New comment by galsapir in "Borges' cartographers and the tacit skill of reading LM output"]]></title><description><![CDATA[
<p>yeah I was really thinking about what the best "umbrella term" would be here. Since "LLM" is too widely used in a really specific context and "AI systems" felt niche I ended up with "LMs". Idk, up for debate..</p>
]]></description><pubDate>Sat, 11 Apr 2026 19:09:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47733172</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47733172</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47733172</guid></item><item><title><![CDATA[New comment by galsapir in "Borges' cartographers and the tacit skill of reading LM output"]]></title><description><![CDATA[
<p>haha that's a style choice (takes more work to get lowercase text these days). But yeah legit ;-)</p>
]]></description><pubDate>Sat, 11 Apr 2026 18:19:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47732784</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47732784</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47732784</guid></item><item><title><![CDATA[New comment by galsapir in "Borges' cartographers and the tacit skill of reading LM output"]]></title><description><![CDATA[
<p>Thanks! I'll check it out.</p>
]]></description><pubDate>Sat, 11 Apr 2026 17:19:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47732280</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47732280</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47732280</guid></item><item><title><![CDATA[Borges' cartographers and the tacit skill of reading LM output]]></title><description><![CDATA[
<p>Article URL: <a href="https://galsapir.github.io/sparse-thoughts/2026/04/11/map-and-territory/">https://galsapir.github.io/sparse-thoughts/2026/04/11/map-and-territory/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47730229">https://news.ycombinator.com/item?id=47730229</a></p>
<p>Points: 38</p>
<p># Comments: 10</p>
]]></description><pubDate>Sat, 11 Apr 2026 13:06:34 +0000</pubDate><link>https://galsapir.github.io/sparse-thoughts/2026/04/11/map-and-territory/</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47730229</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47730229</guid></item><item><title><![CDATA[New comment by galsapir in "Executing programs inside transformers with exponentially faster inference"]]></title><description><![CDATA[
<p>one of the most interesting pieces I've read recently. Not sure I agree with all the statements there (e.g. without execution the system has no comprehension) - but extremely cool</p>
]]></description><pubDate>Thu, 12 Mar 2026 14:43:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47351279</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47351279</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47351279</guid></item><item><title><![CDATA[New comment by galsapir in "Best read of 2026 so far was written in 1880"]]></title><description><![CDATA[
<p>The part that made me laugh out loud: Dostoyevsky's description of medicine becoming "too specialized" — one doctor for the right nostril and another for the left. That's from a conversation between Ivan and the devil. Written in 1880.
The rest of the novel is like that too — the narrator is semi-omniscient but explicitly unreliable and self-conscious about it, the characters' inner lives contradict their stated beliefs, and the psychological model overall is more sophisticated than most of what we use in social science today.</p>
]]></description><pubDate>Sun, 08 Mar 2026 14:57:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47297842</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47297842</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47297842</guid></item><item><title><![CDATA[Best read of 2026 so far was written in 1880]]></title><description><![CDATA[
<p>Article URL: <a href="https://galsapir.github.io/sparse-thoughts/2026/03/07/reading-q1-2026/">https://galsapir.github.io/sparse-thoughts/2026/03/07/reading-q1-2026/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47297841">https://news.ycombinator.com/item?id=47297841</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Sun, 08 Mar 2026 14:57:53 +0000</pubDate><link>https://galsapir.github.io/sparse-thoughts/2026/03/07/reading-q1-2026/</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47297841</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47297841</guid></item><item><title><![CDATA[Anthropic launched community ambassador program]]></title><description><![CDATA[
<p>Article URL: <a href="https://claude.com/community/ambassadors">https://claude.com/community/ambassadors</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47288679">https://news.ycombinator.com/item?id=47288679</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 07 Mar 2026 15:50:24 +0000</pubDate><link>https://claude.com/community/ambassadors</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47288679</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47288679</guid></item><item><title><![CDATA[New comment by galsapir in "Files are the interface humans and agents interact with"]]></title><description><![CDATA[
<p>nice, esp. liked - "our memories, our thoughts, our designs should outlive the software we used to create them"</p>
]]></description><pubDate>Sat, 07 Mar 2026 15:14:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47288368</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47288368</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47288368</guid></item><item><title><![CDATA[LLMs as nudging research towards luke-warm middle]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.nature.com/articles/s44271-026-00428-5">https://www.nature.com/articles/s44271-026-00428-5</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47142692">https://news.ycombinator.com/item?id=47142692</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 24 Feb 2026 20:41:58 +0000</pubDate><link>https://www.nature.com/articles/s44271-026-00428-5</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47142692</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47142692</guid></item><item><title><![CDATA[New comment by galsapir in "Show HN: Rune | A spec pattern for consistent AI code generation"]]></title><description><![CDATA[
<p>this seems interesting, do you have an example of a use case that you found it helped with? (Red green pattern where without RUNE, it failed)?</p>
]]></description><pubDate>Sat, 14 Feb 2026 16:04:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47015538</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=47015538</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47015538</guid></item><item><title><![CDATA[New comment by galsapir in "[dead]"]]></title><description><![CDATA[
<p>I've been writing about how I use AI tools as an researcher working in health AI — specifically the tension between leveraging them and staying engaged enough to catch when they're wrong.
This post is about a specific version of that problem: the models have gotten good enough that my default is to trust the output, and the threshold for "worth checking" keeps drifting upward. So I built a simple Claude Code skill that sends high-stakes work to a different model family for a second opinion — one call, not a multi-agent debate.
The honest result: the first real test (reviewing an architecture spec) scored maybe 6/10. It caught one genuine security finding and missed the deeper domain questions entirely. That gap maps onto something I keep running into in evals — tools can check structural form (missing error handling, security anti-patterns) but struggle with essence (does this actually work the way the spec assumes? are the clinical guardrails robust?).
Still worth it as a lightweight intervention against the drift toward not checking at all. The skill is open source if anyone wants to try or improve it.</p>
]]></description><pubDate>Wed, 11 Feb 2026 16:58:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=46977461</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=46977461</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46977461</guid></item><item><title><![CDATA[New comment by galsapir in "Bring receipts from your Claude Code sessions"]]></title><description><![CDATA[
<p>haha nice for freelance work!</p>
]]></description><pubDate>Fri, 06 Feb 2026 21:17:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46918258</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=46918258</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46918258</guid></item><item><title><![CDATA[New comment by galsapir in "[dead]"]]></title><description><![CDATA[
<p>opus 4.6 came out yesterday so I tried it and built two things. i think the model is smoother: picks up intent faster, better questions in interview-style flows, more willing to loop for 8+ minutes. the tools: an interview command for claude code with depth checkpoints, and a markdown annotator for actually reviewing what comes back instead of staying in the "fix it plz" loop.</p>
]]></description><pubDate>Fri, 06 Feb 2026 14:11:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=46913026</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=46913026</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46913026</guid></item><item><title><![CDATA[New comment by galsapir in "[dead]"]]></title><description><![CDATA[
<p>There's been a lot of discussion around this lately, especially after the Anthropic study. I don't have answers — this is more an attempt to articulate the problem and some mental frameworks that have been useful. Curious what practices others have found helpful</p>
]]></description><pubDate>Tue, 03 Feb 2026 21:50:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=46877822</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=46877822</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46877822</guid></item><item><title><![CDATA[New comment by galsapir in "[dead]"]]></title><description><![CDATA[
<p>we spent a few months building evals for a health agent (and the agent itself!). tried to apply anthropic's framework to a real system looking at CGM data + diet.
some of it worked. we got decent at checking form — citations exist, tools were called, numbers trace back. the harder part was essence — is this clinically appropriate? actually helpful? we didn't really solve that.
curious if others building health/bio agents have found ways around this, or if everyone's just accepting fuzzy metrics for the stuff that matters.</p>
]]></description><pubDate>Thu, 29 Jan 2026 20:27:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=46816093</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=46816093</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46816093</guid></item><item><title><![CDATA[New comment by galsapir in "How do you evaluate a foundation model before you know what it's for?"]]></title><description><![CDATA[
<p>foundation models in biology still haven't proven they're worth it vs simpler methods (imo). we just published one in Nature, and i feel i spent more time on "how will we know this worked" than on the model itself. the hard part was (mostly) deciding what success even means. open for thoughts</p>
]]></description><pubDate>Fri, 23 Jan 2026 21:01:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=46737868</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=46737868</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46737868</guid></item><item><title><![CDATA[How do you evaluate a foundation model before you know what it's for?]]></title><description><![CDATA[
<p>Article URL: <a href="https://galsapir.github.io/sparse-thoughts/2026/01/23/what-is-a-good-fm/">https://galsapir.github.io/sparse-thoughts/2026/01/23/what-is-a-good-fm/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46737867">https://news.ycombinator.com/item?id=46737867</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Fri, 23 Jan 2026 21:01:28 +0000</pubDate><link>https://galsapir.github.io/sparse-thoughts/2026/01/23/what-is-a-good-fm/</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=46737867</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46737867</guid></item><item><title><![CDATA[Ask HN: Anyone using Claude Agent SDK in production?]]></title><description><![CDATA[
<p>We're evaluating agent frameworks for a health AI product and leaning toward Anthropic's Claude Agent SDK. Did a quick POC and liked the simplicity: clean @tool decorator, native MCP support, flat mental model.
But I'm finding fewer production case studies compared to LangGraph or similar. Curious about:<p>Multi-turn conversation handling, does it manage state well or do you thread history manually?
Long-running tasks (minutes/hours), any gotchas with timeouts or checkpointing?
The latency overhead people mention (~12s per query per one github issue). is this still an issue or has it improved?
General production rough edges we should know about?<p>For context: most of our context is pre-computed, occasional JIT tool calls. Comparing against Pydantic AI and LangGraph but trying to avoid over-engineering.
Appreciate any war stories.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46679473">https://news.ycombinator.com/item?id=46679473</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 19 Jan 2026 14:38:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=46679473</link><dc:creator>galsapir</dc:creator><comments>https://news.ycombinator.com/item?id=46679473</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46679473</guid></item></channel></rss>