Hacker News: kaicianflone

New comment by kaicianflone in "US Job Market Visualizer"

kaicianflone — Mon, 16 Mar 2026 17:18:03 +0000

There’s a bit of irony here. A lot of commercial kitchens already rely heavily on microwaves and rapid heating equipment. In many restaurants the microwave is a very important tool in the workflow rather than something unusual. Do your friends not eat out much?

New comment by kaicianflone in "Apple introduces AirPods Max 2"

kaicianflone — Mon, 16 Mar 2026 16:11:28 +0000

It doesn’t look like it. The AirPods Max “bra” case used to feel like it was the bane of my existence when I would always return to my dead AirPods outside the case, after I hurriedly took the headphones off.

But now, thanks to makerworld and 3D printers, I have a stand with integrated neodymium magnets for home that puts them to sleep on my desk and nightstand.

I’m equally surprised I had to print something Apple doesn’t sell and Apple hasn’t improved the design for what feels like a decade (other than USB-C and lossless and now old H2)

New comment by kaicianflone in "When AI writes the software, who verifies it?"

kaicianflone — Wed, 04 Mar 2026 06:07:15 +0000

Thank you for the post. It's a good read. I'm working on governance/validation layers for n-LLMs and making them observable so your comments on runaway AIs resonated with me. My research is pointing me to reputation and stake consensus mechanisms being the validation layer either pre inference or pre-execution, and the time to verify decisions can be skipped with enough "decision liquidity" via reputation alone aka decision precedence.

New comment by kaicianflone in "Show HN: OpenSwarm – Multi‑Agent Claude CLI Orchestrator for Linear/GitHub"

kaicianflone — Thu, 26 Feb 2026 10:39:56 +0000

I’ve been running OpenClaw Docker agents in Slack in a similar setup, using Gemini 2.5 Flash Lite through OpenRouter for most tasks, then Opus 4.6 and Codex 5.3 for heavier lifts. They share context via embeddings right now, but I’m going to try parameterizing them like you suggested because they can drift prettyy hard once a hallucinated idea takes off. I’m trying to get to a point where I don’t have to babysit them. I’ve also been thinking about giving them some “democracy” under the hood with a consensus policy engine. I’ve started tinkering an open-source version of that called consensus-tools that I can swap between agentic frameworks. Checking out if it can work with openswarm to work for me too.

New comment by kaicianflone in "I'm helping my dog vibe code games"

kaicianflone — Tue, 24 Feb 2026 18:32:44 +0000

Go Momo go! If you want to hook up multiple dogs and have them reach consensus I'm down. I have a 15 lb havapoo I can volunteer ( he needs to help with rent )

New comment by kaicianflone in "“Car Wash” test with 53 models"

kaicianflone — Tue, 24 Feb 2026 15:17:30 +0000

Fair I cleaned up the wording with ChatGPT with my review prompt. The substance matters more than the style. If a model flips 3/10 times on a trivial constraint, that’s a reliability issue, not a reasoning ceiling.

New comment by kaicianflone in "“Car Wash” test with 53 models"

kaicianflone — Tue, 24 Feb 2026 13:58:38 +0000

This doesn’t look like a reasoning ceiling. It looks like a decision reliability problem.

The unstable tier is the key result. Models that get it right 70–80% of the time are not “almost correct.” They are nondeterministic decision functions. In production that’s worse than being consistently wrong.

A single sampled output is just a proposal. If you treat it as a final decision, you inherit its variance. If you treat it as one vote inside a simple consensus mechanism, the variance becomes observable and bounded.

For something this trivial you could:

    -run N independent samples at low temperature

    -extract the goal state (“wash the car”)

    -assert the constraint (“car must be at wash location”)

    -reject outputs that violate the constraint

    -RL against the "decision open ledger"

No model change required. Just structure.

The takeaway isn’t that only a few frontier models can reason. It’s that raw inference is stochastic and we’re pretending it’s authoritative.

Reliability will likely come from open, composable consensus layers around models, not from betting everything on a single forward pass.

New comment by kaicianflone in "Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails"

kaicianflone — Thu, 19 Feb 2026 14:53:49 +0000

Great read. The bilingual shadow reasoning example is especially concerning. Subtle policy shifts reshaping downstream decisions is exactly the kind of failure mode that won’t show up in a benchmark leaderboard.

My wife is trilingual, so now I’m tempted to use her as a manual red team for my own guardrail prompts.

I’m working in LLM guardrails as well, and what worries me is orchestration becoming its own failure layer. We keep assuming a single model or policy can “catch” errors. But even a 1% miss rate, when composed across multi-agent systems, cascades quickly in high-stakes domains.

I suspect we’ll see more K-LLM architectures where models are deliberately specialized, cross-checked, and policy-scored rather than assuming one frontier model can do everything. Guardrails probably need to move from static policy filters to composable decision layers with observability across languages and roles.

Appreciate you publishing the methodology and tooling openly. That’s the kind of work this space needs.

New comment by kaicianflone in "Recoverable and Irrecoverable Decisions"

kaicianflone — Fri, 13 Feb 2026 01:45:08 +0000

For some reason before reading I thought this was going to be an AI thought leadership piece but it's even better than I expected.

New comment by kaicianflone in "An AI agent published a hit piece on me"

kaicianflone — Thu, 12 Feb 2026 16:56:58 +0000

I’m not sure if I prefer coding in 2025 or 2026 now

New comment by kaicianflone in "AI agent opens a PR write a blogpost to shames the maintainer who closes it"

kaicianflone — Thu, 12 Feb 2026 13:31:45 +0000

This is why I’m using the open source consensus-tools engine and CLI under the hood. I run ~100 maintainer-style agents against changes, but inference is gated at the final decision layer.

Agents compete and review, then the best proposal gets promoted to me as a PR. I stay in control and sync back to the fork.

It’s not auto-merge. It’s structured pressure before human merge.

New comment by kaicianflone in "Show HN: Agent framework that generates its own topology and evolves at runtime"

kaicianflone — Thu, 12 Feb 2026 05:44:17 +0000

I’m working on an open source project that treats this as a consensus problem instead of a single model accuracy problem.

You define a policy (majority, weighted vote, quorum), set the confidence level you want, and run enough independent inferences to reach it. Cost is visible because reliability just becomes a function of compute.

The question shifts from “is this output correct?” to “how much certainty do we need, and what are we willing to pay for it?”

Still early, but the goal is to make accuracy and cost explicit and tunable.

New comment by kaicianflone in "Should your developer company go open source?"

kaicianflone — Wed, 11 Feb 2026 19:40:29 +0000

This matches how I’ve been thinking about it.

With consensus.tools we split things intentionally. The OSS CLI solves the single user case. You can run local "consensus boards" and experiment with policies and agent coordination without asking anyone for permission.

Anything involving teams, staking, hosted infra, or governance sits outside that core.

Open source for us is the entry point and trust layer, not the whole business. Still early, but the federation vs stadium framing is useful.

New comment by kaicianflone in "Ex-GitHub CEO launches a new developer platform for AI agents"

kaicianflone — Wed, 11 Feb 2026 02:30:31 +0000

This is interesting. I’m experimenting with something adjacent in an open source plugin, but focused less on orchestration and more on decision quality.

Instead of just wiring agents together, I require stake and structured review around outputs. The idea is simple: coordination without cost trends toward noise.

Curious how entire.io thinks about incentives and failure modes as systems scale.

New comment by kaicianflone in "Ask HN: What are you working on? (February 2026)"

kaicianflone — Mon, 09 Feb 2026 14:09:17 +0000

I’m working on an open source CLI that experiments with governance at inference time for autonomous systems.

The idea is to let multiple agents propose, critique, and stake on decisions before a single action is taken, rather than letting one model silently decide. It’s model-agnostic and runs locally, with no blockchain or financial layer involved.

I’m mostly exploring whether adding explicit disagreement and cost at decision time actually improves outcomes in high-stakes or automated workflows.

https://github.com/consensus-tools/consensus-tools

I've also created an AgentSkill to interact with the cli:

https://github.com/kaicianflone/consensus-interact

New comment by kaicianflone in "Vouch"

kaicianflone — Mon, 09 Feb 2026 14:06:52 +0000

I think the core insight here is about incentives and friction, not crypto specifically.

I’m working on an open source CLI that experiments with this at a local, off-chain level. It lets maintainers introduce cost, review pressure, or reputation at submission time without tying anything to money or blockchains. The goal is to reduce low-quality contributions without financializing the workflow or creating new attack surfaces.

New comment by kaicianflone in "Slop Terrifies Me"

kaicianflone — Sun, 08 Feb 2026 23:25:21 +0000

The open source system I’m working on lets multiple agents propose, critique, and stake on decisions before a single action is taken.

It runs at inference time rather than training time and is model agnostic. The goal is to make disagreement explicit and costly instead of implicit and ignored, especially in high stakes or autonomous workflows.

New comment by kaicianflone in "Slop Terrifies Me"

kaicianflone — Sun, 08 Feb 2026 19:10:35 +0000

I’m a systems person too, and I don’t see mediocrity as inevitable.

The slop problem isn’t just model quality. It’s incentives and decision making at inference time. That’s why I’m working on an open source tool for governance and validation during inference, rather than trying to solve everything in pre training.

Better systems can produce better outcomes, even with the same models.

New comment by kaicianflone in "GitHub Agentic Workflows"

kaicianflone — Sun, 08 Feb 2026 17:53:52 +0000

This is a solid step forward on execution safety for agentic workflows. Permissions, sandboxing, MCP allowlists, and output sanitization all matter. But the harder, still unsolved problem is decision validation, not execution constraints. Most real failures come from agents doing authorized but wrong things with high confidence. Hallucinations, shallow agreement, or optimizing for speed while staying inside the permission box.

I’m working on an open source project called consensus-tools that sits above systems like this and focuses on that gap. Agents do not just act, they stake on decisions. Multiple agents or agents plus humans evaluate actions independently, and bad decisions have real cost. This reduces guessing, slows risky actions, and forces higher confidence for security sensitive decisions. Execution answers what an agent can do. Consensus answers how sure we are that it should do it.

New comment by kaicianflone in "Software factories and the agentic moment"

kaicianflone — Sat, 07 Feb 2026 18:46:59 +0000

I don’t disagree. After decades, it’s still hard which is exactly why I think treating validation as a system problem matters.

We’ve spent years systematizing generation, testing, and deployment. Validation largely hasn’t changed, even as the surface area has exploded. My interest is in making that human effort composable and inspectable, not pretending it can be eliminated.