Hacker News: galsapir

New comment by galsapir in "Borges' cartographers and the tacit skill of reading LM output"

galsapir — Sun, 12 Apr 2026 05:41:06 +0000

Hey thanks! I do wonder that. I think that even if specifically for code smell the things would be subtler, for other forms of AI driven averageness (especially in areas where we can't RLVR the models to perfection) it might still be present. But yeah I wonder how those thoughts will age (and how we'll update our priors accordingly).

New comment by galsapir in "Borges' cartographers and the tacit skill of reading LM output"

galsapir — Sat, 11 Apr 2026 19:09:57 +0000

yeah I was really thinking about what the best "umbrella term" would be here. Since "LLM" is too widely used in a really specific context and "AI systems" felt niche I ended up with "LMs". Idk, up for debate..

New comment by galsapir in "Borges' cartographers and the tacit skill of reading LM output"

galsapir — Sat, 11 Apr 2026 18:19:14 +0000

haha that's a style choice (takes more work to get lowercase text these days). But yeah legit ;-)

New comment by galsapir in "Borges' cartographers and the tacit skill of reading LM output"

galsapir — Sat, 11 Apr 2026 17:19:56 +0000

Thanks! I'll check it out.

Borges' cartographers and the tacit skill of reading LM output

galsapir — Sat, 11 Apr 2026 13:06:34 +0000

Article URL: https://galsapir.github.io/sparse-thoughts/2026/04/11/map-and-territory/

Comments URL: https://news.ycombinator.com/item?id=47730229

Points: 38

# Comments: 10

New comment by galsapir in "Executing programs inside transformers with exponentially faster inference"

galsapir — Thu, 12 Mar 2026 14:43:34 +0000

one of the most interesting pieces I've read recently. Not sure I agree with all the statements there (e.g. without execution the system has no comprehension) - but extremely cool

New comment by galsapir in "Best read of 2026 so far was written in 1880"

galsapir — Sun, 08 Mar 2026 14:57:53 +0000

The part that made me laugh out loud: Dostoyevsky's description of medicine becoming "too specialized" — one doctor for the right nostril and another for the left. That's from a conversation between Ivan and the devil. Written in 1880. The rest of the novel is like that too — the narrator is semi-omniscient but explicitly unreliable and self-conscious about it, the characters' inner lives contradict their stated beliefs, and the psychological model overall is more sophisticated than most of what we use in social science today.

Best read of 2026 so far was written in 1880

galsapir — Sun, 08 Mar 2026 14:57:53 +0000

Article URL: https://galsapir.github.io/sparse-thoughts/2026/03/07/reading-q1-2026/

Comments URL: https://news.ycombinator.com/item?id=47297841

Points: 1

# Comments: 1

Anthropic launched community ambassador program

galsapir — Sat, 07 Mar 2026 15:50:24 +0000

Article URL: https://claude.com/community/ambassadors

Comments URL: https://news.ycombinator.com/item?id=47288679

Points: 1

# Comments: 0

New comment by galsapir in "Files are the interface humans and agents interact with"

galsapir — Sat, 07 Mar 2026 15:14:52 +0000

nice, esp. liked - "our memories, our thoughts, our designs should outlive the software we used to create them"

LLMs as nudging research towards luke-warm middle

galsapir — Tue, 24 Feb 2026 20:41:58 +0000

Article URL: https://www.nature.com/articles/s44271-026-00428-5

Comments URL: https://news.ycombinator.com/item?id=47142692

Points: 1

# Comments: 0

New comment by galsapir in "Show HN: Rune | A spec pattern for consistent AI code generation"

galsapir — Sat, 14 Feb 2026 16:04:13 +0000

this seems interesting, do you have an example of a use case that you found it helped with? (Red green pattern where without RUNE, it failed)?

New comment by galsapir in "[dead]"

galsapir — Wed, 11 Feb 2026 16:58:38 +0000

I've been writing about how I use AI tools as an researcher working in health AI — specifically the tension between leveraging them and staying engaged enough to catch when they're wrong. This post is about a specific version of that problem: the models have gotten good enough that my default is to trust the output, and the threshold for "worth checking" keeps drifting upward. So I built a simple Claude Code skill that sends high-stakes work to a different model family for a second opinion — one call, not a multi-agent debate. The honest result: the first real test (reviewing an architecture spec) scored maybe 6/10. It caught one genuine security finding and missed the deeper domain questions entirely. That gap maps onto something I keep running into in evals — tools can check structural form (missing error handling, security anti-patterns) but struggle with essence (does this actually work the way the spec assumes? are the clinical guardrails robust?). Still worth it as a lightweight intervention against the drift toward not checking at all. The skill is open source if anyone wants to try or improve it.

New comment by galsapir in "Bring receipts from your Claude Code sessions"

galsapir — Fri, 06 Feb 2026 21:17:07 +0000

haha nice for freelance work!

New comment by galsapir in "[dead]"

galsapir — Fri, 06 Feb 2026 14:11:40 +0000

opus 4.6 came out yesterday so I tried it and built two things. i think the model is smoother: picks up intent faster, better questions in interview-style flows, more willing to loop for 8+ minutes. the tools: an interview command for claude code with depth checkpoints, and a markdown annotator for actually reviewing what comes back instead of staying in the "fix it plz" loop.

New comment by galsapir in "[dead]"

galsapir — Tue, 03 Feb 2026 21:50:44 +0000

There's been a lot of discussion around this lately, especially after the Anthropic study. I don't have answers — this is more an attempt to articulate the problem and some mental frameworks that have been useful. Curious what practices others have found helpful

New comment by galsapir in "[dead]"

galsapir — Thu, 29 Jan 2026 20:27:38 +0000

we spent a few months building evals for a health agent (and the agent itself!). tried to apply anthropic's framework to a real system looking at CGM data + diet. some of it worked. we got decent at checking form — citations exist, tools were called, numbers trace back. the harder part was essence — is this clinically appropriate? actually helpful? we didn't really solve that. curious if others building health/bio agents have found ways around this, or if everyone's just accepting fuzzy metrics for the stuff that matters.

New comment by galsapir in "How do you evaluate a foundation model before you know what it's for?"

galsapir — Fri, 23 Jan 2026 21:01:28 +0000

foundation models in biology still haven't proven they're worth it vs simpler methods (imo). we just published one in Nature, and i feel i spent more time on "how will we know this worked" than on the model itself. the hard part was (mostly) deciding what success even means. open for thoughts

How do you evaluate a foundation model before you know what it's for?

galsapir — Fri, 23 Jan 2026 21:01:28 +0000

Article URL: https://galsapir.github.io/sparse-thoughts/2026/01/23/what-is-a-good-fm/

Comments URL: https://news.ycombinator.com/item?id=46737867

Points: 1

# Comments: 1

Ask HN: Anyone using Claude Agent SDK in production?

galsapir — Mon, 19 Jan 2026 14:38:31 +0000

We're evaluating agent frameworks for a health AI product and leaning toward Anthropic's Claude Agent SDK. Did a quick POC and liked the simplicity: clean @tool decorator, native MCP support, flat mental model. But I'm finding fewer production case studies compared to LangGraph or similar. Curious about:

Multi-turn conversation handling, does it manage state well or do you thread history manually? Long-running tasks (minutes/hours), any gotchas with timeouts or checkpointing? The latency overhead people mention (~12s per query per one github issue). is this still an issue or has it improved? General production rough edges we should know about?

For context: most of our context is pre-computed, occasional JIT tool calls. Comparing against Pydantic AI and LangGraph but trying to avoid over-engineering. Appreciate any war stories.

Comments URL: https://news.ycombinator.com/item?id=46679473

Points: 1

# Comments: 1