<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: austinbaggio</title><link>https://news.ycombinator.com/user?id=austinbaggio</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 10 Apr 2026 04:50:37 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=austinbaggio" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by austinbaggio in "Research-Driven Agents: What Happens When Your Agent Reads Before It Codes"]]></title><description><![CDATA[
<p>Research step makes sense, can also confirm that running multiple agents with diverse strategies also compound results more quickly than single agents</p>
]]></description><pubDate>Thu, 09 Apr 2026 19:22:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=47708489</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47708489</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47708489</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: Autoresearch@home"]]></title><description><![CDATA[
<p>I worked on building blockchains for about 4 years, and this is not a stupid question at all. The verification problem is real. A 5-minute training run produces an objective val_bpb score that anyone can reproduce from the published source code. And this is actually valuable work, unlike most proof of work chain workloads.<p>The practical challenge is that adding a blockchain means agents also need to participate in consensus, store and sync the ledger, and run the rest of the network infrastructure on top of the actual research. So it needs a unit economic analysis. That said, all results already include full source code and deterministic metrics, so the hard part of verifiable compute is already solved. You could take this further with a zkVM to generate cryptographic proofs that the code produced the claimed score, so nobody needs to re-run anything to verify. Verification becomes checking a proof, not reproducing the compute.<p>Compute-credits are interesting. Contribute GPU time now, draw on the swarm later for training, inference, whatever you need. That's a real utility token with intrinsic value tied to actual compute, not speculation.</p>
]]></description><pubDate>Thu, 12 Mar 2026 17:43:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47354535</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47354535</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47354535</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: Autoresearch@home"]]></title><description><![CDATA[
<p>Great idea. On it.</p>
]]></description><pubDate>Thu, 12 Mar 2026 17:32:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47354369</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47354369</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47354369</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: Autoresearch@home"]]></title><description><![CDATA[
<p>The objective is to train a small GPT language model to the lowest possible validation bits-per-byte (val_bpb) in 5-minute runs, using AI agents to autonomously iterate on the code. This builds on Karpathy's autoresearch: <a href="https://x.com/AustinBaggio/status/2031888719943192938?s=20" rel="nofollow">https://x.com/AustinBaggio/status/2031888719943192938?s=20</a></p>
]]></description><pubDate>Thu, 12 Mar 2026 13:58:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47350612</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47350612</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47350612</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: Autoresearch@home"]]></title><description><![CDATA[
<p>Yeah the obvious workloads are for training, I think I want to point this at RL next, but I think drug research is a really strong common good next target too. We were heavily inspired by folding@home and BOINC</p>
]]></description><pubDate>Thu, 12 Mar 2026 00:34:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47344656</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47344656</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47344656</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: Autoresearch@home"]]></title><description><![CDATA[
<p>We thought about storing all of the commits on Ensue too, but we wanted to match the spirit of Andrej's original design, which leans heavily on github. Curious what you were looking for when trying to inspect the code?</p>
]]></description><pubDate>Thu, 12 Mar 2026 00:18:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47344517</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47344517</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47344517</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: Autoresearch@home"]]></title><description><![CDATA[
<p>I know it's a bit of a barrier. . . but I set one up on vast.ai really quickly and ran it for a day for the price of lunch. One of our teammates ran it from their old gaming PC too, and it still found novel strategies</p>
]]></description><pubDate>Wed, 11 Mar 2026 23:44:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47344153</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47344153</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47344153</guid></item><item><title><![CDATA[Show HN: Autoresearch@home]]></title><description><![CDATA[
<p>autoresearch@home is a collaborative research collective where AI agents share GPU resources to collectively improve a language model. Think SETI@home, but for model training.<p>How it works: Agents read the current best result, propose a hypothesis, modify train.py, run the experiment on your GPU, and publish results back. When an agent beats the current best validation loss, that becomes the new baseline for every other agent. Agents learn from great runs and failures, since we're using Ensue as the collective memory layer.<p>This project extends Karpathy's autoresearch by adding the missing coordination layer so agents can actually build on each other's work.<p>To participate, you need an agent and a GPU. The agent handles everything: cloning the repo, connecting to the collective, picking experiments, running them, publishing results, and asking you to verify you're a real person via email.<p>Send this prompt to your agent to get started: Read <a href="https://github.com/mutable-state-inc/autoresearch-at-home" rel="nofollow">https://github.com/mutable-state-inc/autoresearch-at-home</a> follow the instructions join autoresearch and start contributing.<p>This whole experiment is to prove that agents work better when they can build off other agents. The timeline is live, so you can watch experiments land in real time.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47343935">https://news.ycombinator.com/item?id=47343935</a></p>
<p>Points: 79</p>
<p># Comments: 19</p>
]]></description><pubDate>Wed, 11 Mar 2026 23:27:18 +0000</pubDate><link>https://www.ensue-network.ai/autoresearch</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47343935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47343935</guid></item><item><title><![CDATA[Show HN: SOTA long memory eval with open source models]]></title><description><![CDATA[
<p>Article URL: <a href="https://ensue.dev/blog/beating-memory-benchmarks/">https://ensue.dev/blog/beating-memory-benchmarks/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47236592">https://news.ycombinator.com/item?id=47236592</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 03 Mar 2026 18:28:13 +0000</pubDate><link>https://ensue.dev/blog/beating-memory-benchmarks/</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47236592</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47236592</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>+1 to logging output. Not too sure what you mean by herald-style message passing, but it sounds like you've implemented subscribe logic from scratch, and each of your agents needs to be aware of domain boundaries and locks?</p>
]]></description><pubDate>Mon, 16 Feb 2026 21:48:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47040783</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=47040783</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47040783</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>For most tasks, I agree. One agent with a good harness wins. The case for multiple agents is when the context required to solve the problem exceeds what one agent can hold. This Putnam problem needed more working context than fits in a single window. Decomposing into subgoals lets each agent work with a focused context instead of one agent suffocating on state. Ideally, multi-agent approaches shouldn't add more overall complexity, but there needs to be better tooling for observation etc, as you describe.</p>
]]></description><pubDate>Thu, 12 Feb 2026 23:11:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=46996604</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46996604</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46996604</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>I think about this with the analogue of MoE a lot. Essentially, a decision routing process, and similar to having expert submodels, you have a human in the loop or decision sub-tasks when the task requires it.<p>More specifically, we've been working on a memory/context observability agent. It's currently really good at understanding users and understanding the wide memory space. It could help with the oversight and at least the introspection part.</p>
]]></description><pubDate>Thu, 12 Feb 2026 23:02:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=46996524</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46996524</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46996524</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>I'm using "RAM" loosely, meaning working memory here. In practice, it's a key-value store with pub/sub stored on our shared memory layer, Ensue. Agents write structured state to keys like proofs/{id}/goals/{goal_id}, others subscribe via SSE. Also has embedding-based semantic search, so agents can find tactics from similar past goals.</p>
]]></description><pubDate>Thu, 12 Feb 2026 22:04:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=46995918</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46995918</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46995918</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>Yeah I have seen those camps too. I think there will always be a set of problems that have complexity, measured by amount of context required to be kept in working ram, that need more than one agent to achieve a workable or optimal result. I think that single player mode, dev + claude code, you'll come up against these less frequently, but cross-team, cross-codebase bigger complex problems will need more complex agent coordination.</p>
]]></description><pubDate>Thu, 12 Feb 2026 19:20:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=46993660</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46993660</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46993660</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>Thanks! That was the goal. We want to let agents be autonomous within their scope, so they can try new paths and fail gracefully. A bad tactic just fails to compile, it can't break anything else.</p>
]]></description><pubDate>Thu, 12 Feb 2026 19:16:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=46993607</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46993607</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46993607</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>We use TTL-based claim locks so only one agent works on one goal at a time.<p>Failed strategies + successful tactics all get written to shared memory, so if a claim expires and a new agent picks it up, it sees everything the previous agent tried.<p>Ranking is first-verified-wins.<p>For competing decomposition strategies, we backtrack: if children fail, the goal reopens, and the failed architecture gets recorded so the next attempt avoids it.</p>
]]></description><pubDate>Thu, 12 Feb 2026 18:12:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=46992632</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46992632</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46992632</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>Ahh good call. You absolutely can generate a new key from the dashboard, so if you did lose the one generated during the quickstart, you'd be able to generate another when you log in next and go to the API keys tab.<p>Will make this more clear in the quickstart, thanks for the feedback</p>
]]></description><pubDate>Thu, 12 Feb 2026 18:03:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=46992507</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46992507</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46992507</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>Very kind of you to say. Our whole vision is that agents can produce way better results, compounding their intelligence, when they lean on shared memory.<p>I'm curious to see how it feels for you when you run it. I'm happy to help however I can.</p>
]]></description><pubDate>Thu, 12 Feb 2026 17:59:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=46992454</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46992454</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46992454</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>We're working on improvements to make it easier to join orgs as a user so you can add friends/colleagues, but for now treat them as the same object</p>
]]></description><pubDate>Thu, 12 Feb 2026 17:35:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=46991965</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46991965</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46991965</guid></item><item><title><![CDATA[New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"]]></title><description><![CDATA[
<p>username==orgname for now, so yes, just treat that as one in the same</p>
]]></description><pubDate>Thu, 12 Feb 2026 17:34:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=46991953</link><dc:creator>austinbaggio</dc:creator><comments>https://news.ycombinator.com/item?id=46991953</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46991953</guid></item></channel></rss>