Hacker News: austinbaggio

New comment by austinbaggio in "Research-Driven Agents: What Happens When Your Agent Reads Before It Codes"

austinbaggio — Thu, 09 Apr 2026 19:22:32 +0000

Research step makes sense, can also confirm that running multiple agents with diverse strategies also compound results more quickly than single agents

New comment by austinbaggio in "Show HN: Autoresearch@home"

austinbaggio — Thu, 12 Mar 2026 17:43:14 +0000

I worked on building blockchains for about 4 years, and this is not a stupid question at all. The verification problem is real. A 5-minute training run produces an objective val_bpb score that anyone can reproduce from the published source code. And this is actually valuable work, unlike most proof of work chain workloads.

The practical challenge is that adding a blockchain means agents also need to participate in consensus, store and sync the ledger, and run the rest of the network infrastructure on top of the actual research. So it needs a unit economic analysis. That said, all results already include full source code and deterministic metrics, so the hard part of verifiable compute is already solved. You could take this further with a zkVM to generate cryptographic proofs that the code produced the claimed score, so nobody needs to re-run anything to verify. Verification becomes checking a proof, not reproducing the compute.

Compute-credits are interesting. Contribute GPU time now, draw on the swarm later for training, inference, whatever you need. That's a real utility token with intrinsic value tied to actual compute, not speculation.

New comment by austinbaggio in "Show HN: Autoresearch@home"

austinbaggio — Thu, 12 Mar 2026 17:32:01 +0000

Great idea. On it.

New comment by austinbaggio in "Show HN: Autoresearch@home"

austinbaggio — Thu, 12 Mar 2026 13:58:11 +0000

The objective is to train a small GPT language model to the lowest possible validation bits-per-byte (val_bpb) in 5-minute runs, using AI agents to autonomously iterate on the code. This builds on Karpathy's autoresearch: https://x.com/AustinBaggio/status/2031888719943192938?s=20

New comment by austinbaggio in "Show HN: Autoresearch@home"

austinbaggio — Thu, 12 Mar 2026 00:34:38 +0000

Yeah the obvious workloads are for training, I think I want to point this at RL next, but I think drug research is a really strong common good next target too. We were heavily inspired by folding@home and BOINC

New comment by austinbaggio in "Show HN: Autoresearch@home"

austinbaggio — Thu, 12 Mar 2026 00:18:48 +0000

We thought about storing all of the commits on Ensue too, but we wanted to match the spirit of Andrej's original design, which leans heavily on github. Curious what you were looking for when trying to inspect the code?

New comment by austinbaggio in "Show HN: Autoresearch@home"

austinbaggio — Wed, 11 Mar 2026 23:44:01 +0000

I know it's a bit of a barrier. . . but I set one up on vast.ai really quickly and ran it for a day for the price of lunch. One of our teammates ran it from their old gaming PC too, and it still found novel strategies

Show HN: Autoresearch@home

austinbaggio — Wed, 11 Mar 2026 23:27:18 +0000

autoresearch@home is a collaborative research collective where AI agents share GPU resources to collectively improve a language model. Think SETI@home, but for model training.

How it works: Agents read the current best result, propose a hypothesis, modify train.py, run the experiment on your GPU, and publish results back. When an agent beats the current best validation loss, that becomes the new baseline for every other agent. Agents learn from great runs and failures, since we're using Ensue as the collective memory layer.

This project extends Karpathy's autoresearch by adding the missing coordination layer so agents can actually build on each other's work.

To participate, you need an agent and a GPU. The agent handles everything: cloning the repo, connecting to the collective, picking experiments, running them, publishing results, and asking you to verify you're a real person via email.

Send this prompt to your agent to get started: Read https://github.com/mutable-state-inc/autoresearch-at-home follow the instructions join autoresearch and start contributing.

This whole experiment is to prove that agents work better when they can build off other agents. The timeline is live, so you can watch experiments land in real time.

Comments URL: https://news.ycombinator.com/item?id=47343935

Points: 79

# Comments: 19

Show HN: SOTA long memory eval with open source models

austinbaggio — Tue, 03 Mar 2026 18:28:13 +0000

Article URL: https://ensue.dev/blog/beating-memory-benchmarks/

Comments URL: https://news.ycombinator.com/item?id=47236592

Points: 5

# Comments: 0

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Mon, 16 Feb 2026 21:48:13 +0000

+1 to logging output. Not too sure what you mean by herald-style message passing, but it sounds like you've implemented subscribe logic from scratch, and each of your agents needs to be aware of domain boundaries and locks?

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 23:11:39 +0000

For most tasks, I agree. One agent with a good harness wins. The case for multiple agents is when the context required to solve the problem exceeds what one agent can hold. This Putnam problem needed more working context than fits in a single window. Decomposing into subgoals lets each agent work with a focused context instead of one agent suffocating on state. Ideally, multi-agent approaches shouldn't add more overall complexity, but there needs to be better tooling for observation etc, as you describe.

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 23:02:56 +0000

I think about this with the analogue of MoE a lot. Essentially, a decision routing process, and similar to having expert submodels, you have a human in the loop or decision sub-tasks when the task requires it.

More specifically, we've been working on a memory/context observability agent. It's currently really good at understanding users and understanding the wide memory space. It could help with the oversight and at least the introspection part.

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 22:04:14 +0000

I'm using "RAM" loosely, meaning working memory here. In practice, it's a key-value store with pub/sub stored on our shared memory layer, Ensue. Agents write structured state to keys like proofs/{id}/goals/{goal_id}, others subscribe via SSE. Also has embedding-based semantic search, so agents can find tactics from similar past goals.

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 19:20:02 +0000

Yeah I have seen those camps too. I think there will always be a set of problems that have complexity, measured by amount of context required to be kept in working ram, that need more than one agent to achieve a workable or optimal result. I think that single player mode, dev + claude code, you'll come up against these less frequently, but cross-team, cross-codebase bigger complex problems will need more complex agent coordination.

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 19:16:47 +0000

Thanks! That was the goal. We want to let agents be autonomous within their scope, so they can try new paths and fail gracefully. A bad tactic just fails to compile, it can't break anything else.

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 18:12:00 +0000

We use TTL-based claim locks so only one agent works on one goal at a time.

Failed strategies + successful tactics all get written to shared memory, so if a claim expires and a new agent picks it up, it sees everything the previous agent tried.

Ranking is first-verified-wins.

For competing decomposition strategies, we backtrack: if children fail, the goal reopens, and the failed architecture gets recorded so the next attempt avoids it.

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 18:03:29 +0000

Ahh good call. You absolutely can generate a new key from the dashboard, so if you did lose the one generated during the quickstart, you'd be able to generate another when you log in next and go to the API keys tab.

Will make this more clear in the quickstart, thanks for the feedback

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 17:59:58 +0000

Very kind of you to say. Our whole vision is that agents can produce way better results, compounding their intelligence, when they lean on shared memory.

I'm curious to see how it feels for you when you run it. I'm happy to help however I can.

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 17:35:01 +0000

We're working on improvements to make it easier to join orgs as a user so you can add friends/colleagues, but for now treat them as the same object

New comment by austinbaggio in "Show HN: 20+ Claude Code agents coordinating on real work (open source)"

austinbaggio — Thu, 12 Feb 2026 17:34:11 +0000

username==orgname for now, so yes, just treat that as one in the same