Hacker News: 10keane

Yes, I set up Karpathy's LLM wiki. Now what?

10keane — Fri, 08 May 2026 13:46:53 +0000

Article URL: https://twitter.com/keane42443/status/2052426761477255448

Comments URL: https://news.ycombinator.com/item?id=48063085

Points: 1

# Comments: 0

"An Agent Cannot See Its Own Bugs" – things I notice running multi-agent daily

10keane — Mon, 27 Apr 2026 12:49:52 +0000

Article URL: https://twitter.com/keane42443/status/2048419624124113119

Comments URL: https://news.ycombinator.com/item?id=47920919

Points: 2

# Comments: 0

New comment by 10keane in "Show HN: Agent Vault – Open-source credential proxy and vault for agents"

10keane — Fri, 24 Apr 2026 06:55:33 +0000

been thinking about this exact problem for a while. my own setup uses OS keyring with a token substitution pattern — the agent requests a credential by name, the substitution happens at execution time, the LLM never sees the raw value in context or logs. works reasonably well.

but the problem with that model is it's static protection. if the agent process itself becomes hostile or gets prompt-injected, keyring doesn't really help — it can still request the secret and get it, it just doesn't see it in the context window.

the shift i've been landing on and building into Orbital(my own project) is that it's less about blocking credential access and more about supervising it. you want to know exactly when and why the agent is requesting something, and have the ability to approve or deny in the moment. pre-set policies are hard because you genuinely can't anticipate what tools an agent will call before it runs — claude code might use curl, bash, or a completely random command depending on the problem. the approval needs to happen at runtime, not preset.

the proxy model here is interesting because it creates a natural supervision boundary. curious whether you're planning runtime approval flows or if the design stays policy-based.

New comment by 10keane in "Workspace Agents in ChatGPT"

10keane — Thu, 23 Apr 2026 06:40:25 +0000

i actually like the concept of workspace agent, because i am feeling some real pain here to run long-term project while retaining context for each instance of agent. but based on the demo it seems more like for cooperation instead of preserving long-term project state: decisions made, actions taken, approvals given, history of what each agent did and why. it is then just a more convenient chatgpt entry in group chat.

another thing: this is all on OpenAI's servers. Which is fine if that's what you want. But there's a real class of user — technical, working on actual production code, security-conscious — for whom "my workspace lives on my machine, in my git repo, under my version control, works for my other non-openai tools" is a hard requirement, not a preference.

New comment by 10keane in "Show HN: Orbital – Give Your Agent a Project, Not a Prompt"

10keane — Tue, 21 Apr 2026 12:34:55 +0000

Hi HN, I'm keane. Orbital is an open-source desktop app for running AI agents in a managed environment. Been building it for two months while holding a day job. Solo dev, mac and windows installers on the release page.

Why this exists:

- I loved Claude Projects, but I couldn't let an agent update the project, and it didn't live on my machine. Cowork Projects now can — but only Claude, closed source.

- I loved OpenClaw, but I had no control over what it was doing on my behalf.

Neither was the thing I actually wanted.

The thing I actually wanted is rooted in a belief about where agent-human interaction is going: from micromanagement to delegation.

Micromanagement — where most of us are today. You give a specific task. You hand-hold the agent, watch it work, provide context when it asks, correct it when it drifts. You're guiding an intern.

Delegation — what I want. It's handing work to a colleague the way you would when you leave a company: you give them all the context, describe the objective, set the boundaries, and then let go. Maybe they check in periodically, but you don't watch every keystroke.

For delegation to work, the agent needs a place to live. Not a session. A "project".

What a project is in Orbital:

- A workspace folder

- Persistent human-editable memory that survives restarts

- A budget cap

- A sandbox

- An autonomy preset (supervised / check-in / hands-off)

- Approval gates on write-risk tool calls

- A shared space for sub-agent coordination. The management agent is an autonomous agent I wrote; sub-agents (Claude Code, Codex, Goose, Cline) are discovered via SDK/ACP/PTY on the system path and called with separate context windows so their output doesn't bleed back into the main one.

The project is the unit you delegate. Everything else — approvals, budgets, memory, boundaries — are the affordances that make delegation actually safe.

Everything is transparent. Everything, from input to output, is on your machine. The only thing that leaves is the LLM API call.

Where this fits relative to other things:

- Claude/Cowork Projects: closest mental model, but you can't dispatch other agents like Codex to work in parallel. Exclusive to Claude.

- OpenClaw / Hermes: session-centric or agent-centric. Orbital is project-centric. Your project can delegate to them as sub-agents (planned).

What's real today. 335 commits over two months. Desktop installers for mac and windows. Used daily for a month — including to run my own launch prep. There's a distilled marketing-agent skill inside the repo that reads my calendar and drafts the next day's tasks, which is how I'm shipping this at all while holding a day job.

What's not there yet. Linux sandbox. Native mobile app (today it's LAN QR pairing plus an optional relay for remote supervision). Agent marketplace. Cross-project coordination with approval cascades. Adaptive autonomy.

Happy to answer questions about the architecture, the sub-agent handoff, the sandbox trade-offs, or anything else.

Show HN: Orbital – Give Your Agent a Project, Not a Prompt

10keane — Tue, 21 Apr 2026 12:33:43 +0000

Article URL: https://github.com/zqiren/Orbital

Comments URL: https://news.ycombinator.com/item?id=47847897

Points: 3

# Comments: 1

New comment by 10keane in "Ask HN: What skills are future proof in an AI driven job market?"

10keane — Tue, 21 Apr 2026 09:40:32 +0000

management and critical thinking.

management - it occured to me that giving instructions to agent is very similar to giving instructions to human employees - even the best of them make mistakes.

i learnt that asking claude code to "investigate for 3 potential root causes" is more effective than "investigate the root cause" in bug fix. this blows my mind as i realize that agent can be lazy, can be careless, and we can give better instruction to prevent that.

another reason why i said this is that giving enough context and defining blast boundary is more efficient than hand-holding/micromanaging and checking every tool call for agents. the management skill for human employees also works here.

critical thinking - you just need to have your judgement on the seemingly solid but actually halluncinated agent bs.

New comment by 10keane in "Apple's accidental moat: How the "AI Loser" may end up winning"

10keane — Fri, 17 Apr 2026 06:50:46 +0000

idk man. what do you think is their business model? is it more like ai value-add on their existing services?

New comment by 10keane in "Tell HN: I regret every single time I use AI"

10keane — Fri, 17 Apr 2026 06:48:39 +0000

i am using --dangerously-skip-permissions with task spec. think this is faster. and it gave me more control actually over architecture and product decision. i think i just dont like reacting to suggestions mid-flow

New comment by 10keane in "Show HN: Agent Armor, a Rust runtime for enforcing policies on AI agent actions"

10keane — Fri, 17 Apr 2026 06:43:41 +0000

great project. think my agent will need it. but then one thing i notice is that this only catches single tool calls. most of the time the malicious behavior is a sequence where each call looks fine on its own: read a file, read another, then a curl to somewhere benign-sounding. individually each one scores low. the arc is the dangerous part and per-call scoring kinda misses that.

New comment by 10keane in "Sal Khan's AI revolution hasn't happened yet"

10keane — Thu, 16 Apr 2026 07:18:59 +0000

what is the point of teaching anyway when fundational knowledge are becoming obsolete?

i think what should be taught is the metacognative ability - like how to retrieve knowledge, how to ask the right questions towards a certain goal. knowledge itself are easily accessible with ai. now the difficult part is the ability to discern actual knowledge from llm halucination bs, the ability to retrieve the required knowledge given a scenario.

this still requires some foundational grounding — you can't detect bullshit with zero context. but the balance shifts from memorization to retrieval, iteration, verification. honestly i think it is more about critical thinking and philosophy.

New comment by 10keane in "Vibe Coding Fails"

10keane — Wed, 15 Apr 2026 14:12:33 +0000

exactly. vibe coding only works when you fully understand the problem and know precisely how to solve it. ai just do the dirty implementation work for you.

that is another reason in why i separate product/architecture design and implementation into two agents with isolated context in my workflow. because i can always iterate with the product agent to refine my understanding and THEN ask the coding agent to implement it. by that time i already have the ability to make proper judgement and evaluate coding agent's output

Why Vibe Coding Fails

10keane — Wed, 15 Apr 2026 13:50:04 +0000

i am using claude to maintain an agent loop, which will pause to ask for users' approval before important tool call. while doing some bug fixes，i have identified some clear patterns and reasons why vibe coding can fail for people who dont have technical knowledge and architecture expertise.

let me describe my workflow first - this has been my workflow across hundreds of successful sessions: 1. identify bugs through dogfooding 2. ask claude code to investigate the codebase for three potential root causes. 3. paste the root causes and proposed fixes to claude project where i store all architecture doc and design decision for it to evaluate 4. discuss with claude in project to write detailed task spec - the task spec will have a specified format with all sorts of test 5. give it back to claude code to implement the fix

in today's session, the root cause analysis was still great, but the proposed fixes are so bad that i really think that's how most of vibe coded project lost maintainability in the long run.

there is two of the root causes and proposed fix:

bug: agent asks for user approval, but sometimes the approval popup doesnt show up. i tried sending a message to unstick it. message got silently swallowed. agent looks dead. and i needed to restart the entire thing.

claude's evaluation: root cause 1: the approval popup is sent once over a live connection. if the user's ui isn't connected at that moment — page refresh, phone backgrounded, flaky connection — they never see it. no retry, no recovery.

this is actually true.

proposed fix "let's save approval state to disk so it survives crashes". sounds fine but then the key is by design, if things crashes, the agent will cold-resume from the session log, and it wont pick up the approval state anyway. the fix just add schema complexity and it's completely useless

root cause 2: when an approval gets interrupted (daemon crash, user restart), there's an orphan tool_call in the session history with no matching tool_result.

proposed fix: "write a synthetic tool_result to keep the session file structurally valid." sounds clean. but i asked: who actually breaks on this? the LLM API? no, it handles missing results. the session replay? no, it reads what's there. the orphan tool_call accurately represents what happened: the tool was called but never completed. that's the truth. writing a fake result to paper over it introduces a new write-coordination concern (when exactly do you write the fake result? what if the daemon crashes during the write?) to solve a problem that doesn't exist. the session file isn't "broken." it's accurate.

claude had full architecture docs, the codebase, and over a hundred sessions of project history in context. it still reaches for the complex solution because it LOOKS like good engineering. it never asked "does it even matter after a restart?"

i have personally encounterd this preference for seemingly more robust over-engineering multiple times. and i genuinely believe that this is where human operate actually should step in, instead of giving an one-sentence requirement and watches agents to do all sorts of "robust" engineering.

Comments URL: https://news.ycombinator.com/item?id=47778946

Points: 7

# Comments: 3

New comment by 10keane in "Ask HN: How can we weaken China?"

10keane — Wed, 15 Apr 2026 11:33:57 +0000

if not for some strategic mistakes made by the US, you wouldnt even need to ask this question in the first place

New comment by 10keane in "Tell HN: I regret every single time I use AI"

10keane — Tue, 14 Apr 2026 05:30:46 +0000

i think there are two key things that helped me ship more successfully using ai

1. must isolate context. discuss with your architecture agent, implement with another. you can pass the implementation results back to the architecture agent to check for implementation drift. ai's self check and correction sucks - i guess it is because of the attention mechansim?

2. iterate with your architecture agent to produce a tightly scoped task spec. really need to iterate, ask it your align with you for the key assumption. dont be too ambitious. i myself has a guideline for task spec writing that specify spec cannot cross boundary or work with 2 subsystems in one go

but honestly, ai is only great at diagnosis and implementation. most of my successful runs are on the basis that i know exactly how to code or how to solve the problem. ai just do the dirty work for me.

New comment by 10keane in "Comprehension Debt: The Hidden Cost of AI-Generated Code"

10keane — Tue, 14 Apr 2026 05:20:38 +0000

well written. finally someone mentioned that a human operator that has the full architecture context is needed. that i think is the role of human in coding in future.

but i will argue one thing though. the spec approach is good enough with the current model capability. it is the matter of scoping. if you scope the spec correctly and granular enough, the agent will produce replicable implementation. and if i am to look into future, as model capabilities advances, the spec approach will be better and better, allowing for larger spec scope to be implemented at once

New comment by 10keane in "As AI use increases at work, many still choose not to use it, Gallup poll finds"

10keane — Mon, 13 Apr 2026 14:03:41 +0000

there is really an arbitage oppportunity now with ai. you can just throw a task to ai and do some minor edits afterwards, and claim that you do it mannually. can significantly inflate your perceived capacity lol. works for bosses that are not that ai-savvy

New comment by 10keane in "Apple's accidental moat: How the "AI Loser" may end up winning"

10keane — Mon, 13 Apr 2026 06:05:05 +0000

there are always three elements in the equations of business model: 1. marginal cost 2. marginal revenue 3. value created

for llm providers, i always believe the key is to focus on high value problems such as coding or knowledge work, becaues of the high marginal cost of having new customers - the token burnt. and low marginal revenue if the problem is not valuable enough. in this sense no llm providers can scale like previous social media platforms without taking huge losses. and no meaning user stickiness can be built unless you have users' data. and there is no meaningful business model unless people are willing to pay a high price for the problem you solve, in the same way as paying for a saas.

i am really not optimistic about the llm providers other than anthropic. it seems that the rest are just burning money, and for what? there is no clear path for monetization.

and when the local llm is powerful enough, they will soon be obsolete for the cost, and the unsustainable business model. in the end of the day, i do agree that it is the consumer hardware provider that can win this game.

New comment by 10keane in "Ask HN: Do you trust AI agents with API keys / private keys?"

10keane — Mon, 13 Apr 2026 02:17:10 +0000

yah man i saw your project on the execution layer. i think it is great. but one thing i notice in my daily usage is that i am not sure what to allow or deny before the actual usage. like personally i am not able or interested in pre-setting policies. like claude code, you never know what agents want to call before the actual tool use - could be curl, bash, a random command for a random solution to a random problem. so i believe this supervision needs to be at runtime instead of preset

New comment by 10keane in "Pro Max 5x quota exhausted in 1.5 hours despite moderate usage"

10keane — Sun, 12 Apr 2026 13:55:15 +0000

this same pattern seems to occur every time a new model is about to release. i didnt notice the usage problem - i am on 20x. but opus 4.6 feels siginificantly dumber for some reason. i cant qualitify it, but it failed on everyday tasks where it used to complete perfectly