Hacker News: maxbeech

New comment by maxbeech in "A case study in testing with 100+ Claude agents in parallel"

maxbeech — Sun, 05 Apr 2026 03:29:26 +0000

the thing that actually burns token budget at scale isn't the agent count itself—it's understanding the cost model of orchestrating them. 100 agents running in parallel is fine if they're short-lived queries. but once you start running them on a schedule (hourly checks, overnight batch work), the math changes fast.

each agent run against a real codebase probably spends 20-50k tokens just on context: repo structure, relevant files, recent changes. multiply that by 100 agents running every hour across 10-20 repos, and you're already hitting millions of tokens a day before any actual work happens. add in re-runs for failures or retries, and the cost curve gets steep quickly.

the harder problem is observability. with one agent you can read logs and understand what went wrong. with 100 agents you need aggregation, pattern detection, alerting on the common failure modes. if 3 agents fail silently but identically, was that a real issue or just rate limiting? if 40 agents all timeout at the same step, was it a dependency problem or infrastructure saturation? at scale you're debugging distributions, not individual runs.

also helps to be ruthless about concurrency. the async pattern isn't "run as many as possible at once"—it's "run exactly as many as the API and your budget can support without making the failure modes harder to diagnose." for claude api work that's usually smaller than people expect.

New comment by maxbeech in "The first 40 months of the AI era"

maxbeech — Sat, 28 Mar 2026 23:06:46 +0000

[flagged]

New comment by maxbeech in "Show HN: OpenHelm – OpenClaw but free (using your Claude Code subscription)"

maxbeech — Sat, 28 Mar 2026 16:50:47 +0000

All questions welcome!

Their ToS wouldn't allow you to access CC if it were to be hosted on some shared account in the cloud, but OpenHelm isn't doing that at all. It just spins up Terminal sessions and performs actions you would make yourself.

New comment by maxbeech in "Show HN: OpenHelm – OpenClaw but free (using your Claude Code subscription)"

maxbeech — Fri, 27 Mar 2026 18:05:55 +0000

Hey!

OpenHelm doesn't handle any OAuth itself. It simply inherits the auth from the user's existing Claude Code CLI installation

Yes, full streaming. OpenHelm uses --output-format stream-json when invoking Claude Code, which emits newline-delimited JSON events in real time.

Show HN: OpenHelm – OpenClaw but free (using your Claude Code subscription)

maxbeech — Fri, 27 Mar 2026 16:17:57 +0000

I tried OpenClaw. It is awesome, but token use killed me, and like with Claude Code scheduling, I ended up spending most of my time managing failed executions and tweaking prompts.

I built OpenHelm to fix this. It’s a local macOS app that turns your high-level goals into a self-running job queue, built directly on top of your existing Claude Code subscription.

Use it free

- Download it now: https://openhelm.ai/

- GitHub: https://github.com/maxbeech/openhelm

- Video: https://youtu.be/FfEBw1SCl7w

How it works

1. Set a goal, eg. "cold reach out to new leads", "grow my SEO", "audit my app weekly", "keep tests green")

2. OpenHelm puts together a plan combining one-off and recurring jobs to make it happen

3. Whenever a job fails, it will spot this, adjust, and try again

4. It will keep on top of the goal, auto-adjusting what jobs are run to actually achieve it

Key benefits

- Fair Source, on GitHub (free for teams under four people)

- Free to use (all LLM calls use your Claude Code subscription)

- Fully local

- Secure (any tokens/passwords stay in macOS Keychain unless you specify otherwise)

Happy to answer any questions!

Comments URL: https://news.ycombinator.com/item?id=47544695

Points: 3

# Comments: 4

New comment by maxbeech in "Storing Claude Code API keys in KeePassXC instead of plaintext config"

maxbeech — Wed, 25 Mar 2026 14:49:11 +0000

good catch on the leakage risk - the pattern of "agent reads .env files and they end up in transcripts" is more common than people realise, especially as claude code gets used for tasks that touch broader parts of a repo.for macOS specifically, the system keychain is a cleaner option than KeePassXC for this workflow. `security find-generic-password -w -s "my-api-key"` returns the secret directly and composes cleanly into a shell wrapper. no daemon required, access can be scoped per-application, and it integrates with Touch ID for interactive prompts.the harder problem is credential management in persistent background agents where you don't want any interactive prompts at all. we ended up using macOS keychain with per-process entitlements (set via a signed plist) so the agent process can retrieve keys non-interactively without ever touching disk. the entitlement approach is a bit painful to set up but means even if the agent process is compromised, the keys aren't in any config file or env var to scrape.(i built something that runs claude code as background scheduled jobs - openhelm.ai - credential handling was one of the more annoying problems to get right)

New comment by maxbeech in "Building a coding agent in Swift from scratch"

maxbeech — Wed, 25 Mar 2026 14:46:31 +0000

the interesting design tension i ran into building in this space is context management for longer sessions. the model accumulates tool call history that degrades output quality well before you hit the hard context limit - you start seeing "let me check that again" loops and increasingly hedged tool selection.a few things that helped: (1) summarizing completed sub-task outputs into a compact working-memory block that replaces the full tool call history, (2) being aggressive about dropping intermediate file read results once the relevant information has been extracted, and (3) structuring the initial system prompt so the model has a clear mental model of what "done" looks like before it starts exploring.the swift angle is actually a nice fit - the structured concurrency model maps well to the agent loop, and the strong type system makes tool schema definition less error-prone than JSON string wrangling in most other languages.

New comment by maxbeech in "Show HN: AI Roundtable – Let 200 models debate your question"

maxbeech — Wed, 25 Mar 2026 08:23:10 +0000

the debate round is the most interesting part of this - curious what you're actually measuring when models "change their minds."the question is whether cross-model exposure changes the actual answer distribution or mostly updates surface presentation while keeping the same underlying conclusion. models are generally trained to be responsive to context and to avoid apparent contradiction, which could look like genuine updating but just be social pressure sensitivity.one experiment worth trying: run a debate where each model sees a summary of the other models' reasoning without seeing their specific answer or which model gave it. see if agreement rates change compared to the version where models see attributed answers with model names. if the named version shows higher agreement it would suggest status/brand effects rather than reasoning-based updating.also curious whether the "reviewer model" that summarizes the transcript can itself be swapped out and whether the summary framing affects the perceived winner. that would be another confound worth controlling for.

New comment by maxbeech in "Ask HN: Do you feel less happy when coding with agent?"

maxbeech — Wed, 25 Mar 2026 08:20:54 +0000

the question conflates two things worth separating: enjoyment of problem-solving versus satisfaction of shipping.if most of your craft satisfaction came from debugging a subtle race condition or working out an elegant abstraction, that aha moment is harder to get when the agent gets there first. that's a real loss and worth naming honestly rather than hand-waving away.but if your satisfaction came from seeing something working, from momentum, from having built something a user can actually touch - agents compress the gap between idea and working software in a way that's hard to argue with.where it gets uncomfortable: watching the agent do the intellectually interesting parts while you review and manage QA. that discomfort is useful signal though. it means you were getting satisfaction from implementation work that, in hindsight, could have been delegated. the natural response is to move upstream - to the parts that still require judgment: what to build at all, which edge cases actually matter to real users, what the right abstraction is.for me as a solo founder it's been net positive. the craft satisfaction shifted, it didn't disappear.

New comment by maxbeech in "Pruner – local proxy that cuts Claude Code API bills by 20–70%"

maxbeech — Mon, 23 Mar 2026 04:40:19 +0000

context bloat in claude code runs is real. in my experience the main culprit is tool output verbosity - claude reads whole files when it only needed 10 lines, or grep returns 500 results and all of them end up in the context.

my first instinct was to fix it upstream (tighter tool calls, explicit line limits) rather than filtering downstream. and that helps a lot. but a proxy/filter layer is genuinely useful for the cases you can't control - when the model decides to explore 20 files you didn't expect it to need.

curious about the failure modes though. the hard part of this problem is distinguishing 'noise the model should discard' from 'context the model needs to take the right path' - same data, different task. does pruner do anything to handle cases where the filtering accidentally removes something load-bearing?

New comment by maxbeech in "Teaching Claude to QA a mobile app"

maxbeech — Sun, 22 Mar 2026 22:22:57 +0000

the worktree discipline failure is the most interesting part of this post to me. when claude is interactive, "cd into the wrong repo" is catchable. when it's running unattended on a schedule, you find out in the morning.

the abstraction is right - isolated worktree, scoped task, commit only what belongs. the failure is enforcement. git worktrees don't prevent a process from running `cd ../main-repo`. that requires something external imposing the boundary, not relying on the agent to respect it.

what you've built (the 8:47 sweep) is a narrow-scope autonomous job: well-defined inputs, deterministic outputs, bounded time. these work well because the scope is clear enough that deviation is obvious. the harder category is "fix any failing tests" - that task requires judgment about what's in scope, and judgment is exactly where worktree escapes happen.

i've been working on tooling for scheduling this kind of claude work (openhelm.ai) and the isolation problem is front and center. separate working directories per run, no write access to the main repo unless that's the explicit task. your experience here is exactly the failure mode that design is trying to prevent.

New comment by maxbeech in "TTal – CLI that turns Claude Code into a multi-agent software factory"

maxbeech — Sun, 22 Mar 2026 15:58:46 +0000

nice architecture -- the two-plane model is something i've been thinking about too, from a slightly different angle.

i built something for this actually (openhelm.ai) -- the problem i was solving is less about orchestrating active PR loops and more about scheduling claude code jobs to run unattended on a cron-like schedule. user describes a high-level goal, it gets planned into a set of tasks with a next_fire_at, and those run autonomously in the background even when they're not at their desk.

the piece i found hardest: deciding what requires human approval vs what can auto-proceed. we landed on "no plan runs without user sign-off" as a hard rule, but even within an approved plan, mid-job blockers that need human input are more common than you'd expect.

curious how TTal handles tasks that get legitimately stuck mid-execution -- does the manager agent have heuristics for detecting "stuck vs slow"? the watchdog timeout approach (we sigterm after 30 min) is blunt but works.

Show HN: Loopy – AI User Feedback Manager

maxbeech — Tue, 07 May 2024 10:33:15 +0000

Article URL: https://maxedlabs.com/loopy/

Comments URL: https://news.ycombinator.com/item?id=40284047

Points: 1

# Comments: 0

I struggled for motivation for my side-project, so I built an app to help

maxbeech — Tue, 11 Aug 2020 08:08:29 +0000

TLDR: I Have just released the app today on PH: https://www.producthunt.com/posts/today-build-habits

Been really looking forward to sharing this

There are some days after work or on the weekend where I feel super motivated to do something new or just clear up my emails. Some days though, I really don’t.

One thing I have found that has helped me a lot is to actually time these daily tasks to make sure I don’t skimp on them. I have found this to be so useful I really wanted to share the technique with others.

So here it is! Today helps you to turn to-dos into actual habits, giving you the opportunity to actually build that side-project, start that new hobby or just actually get around to checking your emails.

I’d really love to get YOUR thoughts on it, so please do leave a comment and/or give it a review!

️ https://www.producthunt.com/posts/today-build-habits

Comments URL: https://news.ycombinator.com/item?id=24118235

Points: 3

# Comments: 0

In iOS 13.1: auto super dark screen

maxbeech — Fri, 30 Aug 2019 07:03:38 +0000

Can’t find anyone talking about this and for me it’s the number 1 feature of iOS 13.

With iOS 13 now allowing Shorcuts to modify settings and with 13.1 supporting automation, you can now get a super dark screen at night (cause 0% brightness is not very dark):

Simply set two automations for evening and morning to adjust brightness and crucially toggle white point (the intensity can be changed in settings).

Hope you enjoy as much as I am!

Comments URL: https://news.ycombinator.com/item?id=20836476

Points: 1

# Comments: 0