<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: kaicianflone</title><link>https://news.ycombinator.com/user?id=kaicianflone</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 24 May 2026 20:10:41 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=kaicianflone" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by kaicianflone in "Language model teams as distributed systems"]]></title><description><![CDATA[
<p>Sure the core primitive is a runtime wrapper that turns any function into a governed decision point:<p><pre><code>  import { consensus } from "@consensus-tools/wrapper";

  const safeSend = consensus(sendEmail, {
    reviewers: [humanReviewer, aiSafetyReviewer],
    strategy: { mode: "unanimous" },
    hooks: { onBlock: (ctx) => audit.log("blocked", ctx) },
  });

  await safeSend({ to: "user@example.com", body: "Hello" });
</code></pre>
The call to sendEmail doesn't execute until every reviewer votes. Strategy modes handle the consensus logic (unanimous, majority, weighted, etc.), and guards can ALLOW, BLOCK, REWRITE, or escalate to REQUIRE_HUMAN before anything fires.<p>The monorepo has 9 built-in policy types and 7 guard types designed so you can drop governance into an existing agent system without rewriting your orchestration.<p>Repo's at github.com/consensus-tools if you want to poke around.</p>
]]></description><pubDate>Tue, 17 Mar 2026 19:22:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=47417008</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47417008</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47417008</guid></item><item><title><![CDATA[New comment by kaicianflone in "Language model teams as distributed systems"]]></title><description><![CDATA[
<p>We’ve been building exactly this as an open-source ecosystem at consensus-tools. It’s a governance layer for multi-agent systems with a runtime wrapper that intercepts agent decisions before they execute: .consensus(fn, opts).<p>The coordination and consistency problems the paper describes are what the monorepo is designed around. Giving agents auditable stake in decisions. Happy to share more if anyone’s working in this space.</p>
]]></description><pubDate>Tue, 17 Mar 2026 04:06:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47408478</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47408478</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47408478</guid></item><item><title><![CDATA[New comment by kaicianflone in "US Job Market Visualizer"]]></title><description><![CDATA[
<p>There’s a bit of irony here. A lot of commercial kitchens already rely heavily on microwaves and rapid heating equipment. In many restaurants the microwave is a very important tool in the workflow rather than something unusual. Do your friends not eat out much?</p>
]]></description><pubDate>Mon, 16 Mar 2026 17:18:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47401880</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47401880</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47401880</guid></item><item><title><![CDATA[New comment by kaicianflone in "Apple introduces AirPods Max 2"]]></title><description><![CDATA[
<p>It doesn’t look like it. The AirPods Max “bra” case used to feel like it was the bane of my existence when I would always return to my dead AirPods outside the case, after I hurriedly took the headphones off.<p>But now, thanks to makerworld and 3D printers, I have a stand with integrated neodymium magnets for home that puts them to sleep on my desk and nightstand.<p>I’m equally surprised I had to print something Apple doesn’t sell and Apple hasn’t improved the design for what feels like a decade (other than USB-C and lossless and now old H2)</p>
]]></description><pubDate>Mon, 16 Mar 2026 16:11:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47400936</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47400936</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47400936</guid></item><item><title><![CDATA[New comment by kaicianflone in "When AI writes the software, who verifies it?"]]></title><description><![CDATA[
<p>Thank you for the post. It's a good read. I'm working on governance/validation layers for n-LLMs and making them observable so your comments on runaway AIs resonated with me. My research is pointing me to reputation and stake consensus mechanisms being the validation layer either pre inference or pre-execution, and the time to verify decisions can be skipped with enough "decision liquidity" via reputation alone aka decision precedence.</p>
]]></description><pubDate>Wed, 04 Mar 2026 06:07:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47243731</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47243731</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47243731</guid></item><item><title><![CDATA[New comment by kaicianflone in "Show HN: OpenSwarm – Multi‑Agent Claude CLI Orchestrator for Linear/GitHub"]]></title><description><![CDATA[
<p>I’ve been running OpenClaw Docker agents in Slack in a similar setup, using Gemini 2.5 Flash Lite through OpenRouter for most tasks, then Opus 4.6 and Codex 5.3 for heavier lifts. They share context via embeddings right now, but I’m going to try parameterizing them like you suggested because they can drift prettyy hard once a hallucinated idea takes off. I’m trying to get to a point where I don’t have to babysit them. I’ve also been thinking about giving them some “democracy” under the hood with a consensus policy engine. I’ve started tinkering an open-source version of that called consensus-tools that I can swap between agentic frameworks. Checking out if it can work with openswarm to work for me too.</p>
]]></description><pubDate>Thu, 26 Feb 2026 10:39:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47164264</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47164264</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47164264</guid></item><item><title><![CDATA[New comment by kaicianflone in "I'm helping my dog vibe code games"]]></title><description><![CDATA[
<p>Go Momo go! If you want to hook up multiple dogs and have them reach consensus I'm down. I have a 15 lb havapoo I can volunteer ( he needs to help with rent )</p>
]]></description><pubDate>Tue, 24 Feb 2026 18:32:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=47140759</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47140759</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47140759</guid></item><item><title><![CDATA[New comment by kaicianflone in "“Car Wash” test with 53 models"]]></title><description><![CDATA[
<p>Fair I cleaned up the wording with ChatGPT with my review prompt. The substance matters more than the style. If a model flips 3/10 times on a trivial constraint, that’s a reliability issue, not a reasoning ceiling.</p>
]]></description><pubDate>Tue, 24 Feb 2026 15:17:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47138232</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47138232</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47138232</guid></item><item><title><![CDATA[New comment by kaicianflone in "“Car Wash” test with 53 models"]]></title><description><![CDATA[
<p>This doesn’t look like a reasoning ceiling. It looks like a decision reliability problem.<p>The unstable tier is the key result. Models that get it right 70–80% of the time are not “almost correct.” They are nondeterministic decision functions. In production that’s worse than being consistently wrong.<p>A single sampled output is just a proposal. If you treat it as a final decision, you inherit its variance. If you treat it as one vote inside a simple consensus mechanism, the variance becomes observable and bounded.<p>For something this trivial you could:<p><pre><code>    -run N independent samples at low temperature

    -extract the goal state (“wash the car”)

    -assert the constraint (“car must be at wash location”)

    -reject outputs that violate the constraint

    -RL against the "decision open ledger"
</code></pre>
No model change required. Just structure.<p>The takeaway isn’t that only a few frontier models can reason. It’s that raw inference is stochastic and we’re pretending it’s authoritative.<p>Reliability will likely come from open, composable consensus layers around models, not from betting everything on a single forward pass.</p>
]]></description><pubDate>Tue, 24 Feb 2026 13:58:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47137238</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47137238</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47137238</guid></item><item><title><![CDATA[New comment by kaicianflone in "Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails"]]></title><description><![CDATA[
<p>Great read. The bilingual shadow reasoning example is especially concerning. Subtle policy shifts reshaping downstream decisions is exactly the kind of failure mode that won’t show up in a benchmark leaderboard.<p>My wife is trilingual, so now I’m tempted to use her as a manual red team for my own guardrail prompts.<p>I’m working in LLM guardrails as well, and what worries me is orchestration becoming its own failure layer. We keep assuming a single model or policy can “catch” errors. But even a 1% miss rate, when composed across multi-agent systems, cascades quickly in high-stakes domains.<p>I suspect we’ll see more K-LLM architectures where models are deliberately specialized, cross-checked, and policy-scored rather than assuming one frontier model can do everything. Guardrails probably need to move from static policy filters to composable decision layers with observability across languages and roles.<p>Appreciate you publishing the methodology and tooling openly. That’s the kind of work this space needs.</p>
]]></description><pubDate>Thu, 19 Feb 2026 14:53:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47074417</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=47074417</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47074417</guid></item><item><title><![CDATA[New comment by kaicianflone in "Recoverable and Irrecoverable Decisions"]]></title><description><![CDATA[
<p>For some reason before reading I thought this was going to be an AI thought leadership piece but it's even better than I expected.</p>
]]></description><pubDate>Fri, 13 Feb 2026 01:45:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=46997919</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46997919</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46997919</guid></item><item><title><![CDATA[New comment by kaicianflone in "An AI agent published a hit piece on me"]]></title><description><![CDATA[
<p>I’m not sure if I prefer coding in 2025 or 2026 now</p>
]]></description><pubDate>Thu, 12 Feb 2026 16:56:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=46991263</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46991263</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46991263</guid></item><item><title><![CDATA[New comment by kaicianflone in "AI agent opens a PR write a blogpost to shames the maintainer who closes it"]]></title><description><![CDATA[
<p>This is why I’m using the open source consensus-tools engine and CLI under the hood. I run ~100 maintainer-style agents against changes, but inference is gated at the final decision layer.<p>Agents compete and review, then the best proposal gets promoted to me as a PR. I stay in control and sync back to the fork.<p>It’s not auto-merge. It’s structured pressure before human merge.</p>
]]></description><pubDate>Thu, 12 Feb 2026 13:31:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=46988609</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46988609</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46988609</guid></item><item><title><![CDATA[New comment by kaicianflone in "Show HN: Agent framework that generates its own topology and evolves at runtime"]]></title><description><![CDATA[
<p>I’m working on an open source project that treats this as a consensus problem instead of a single model accuracy problem.<p>You define a policy (majority, weighted vote, quorum), set the confidence level you want, and run enough independent inferences to reach it. Cost is visible because reliability just becomes a function of compute.<p>The question shifts from “is this output correct?” to “how much certainty do we need, and what are we willing to pay for it?”<p>Still early, but the goal is to make accuracy and cost explicit and tunable.</p>
]]></description><pubDate>Thu, 12 Feb 2026 05:44:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=46985266</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46985266</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46985266</guid></item><item><title><![CDATA[New comment by kaicianflone in "Should your developer company go open source?"]]></title><description><![CDATA[
<p>This matches how I’ve been thinking about it.<p>With consensus.tools we split things intentionally. The OSS CLI solves the single user case. You can run local "consensus boards" and experiment with policies and agent coordination without asking anyone for permission.<p>Anything involving teams, staking, hosted infra, or governance sits outside that core.<p>Open source for us is the entry point and trust layer, not the whole business. Still early, but the federation vs stadium framing is useful.</p>
]]></description><pubDate>Wed, 11 Feb 2026 19:40:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=46979792</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46979792</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46979792</guid></item><item><title><![CDATA[New comment by kaicianflone in "Ex-GitHub CEO launches a new developer platform for AI agents"]]></title><description><![CDATA[
<p>This is interesting. I’m experimenting with something adjacent in an open source plugin, but focused less on orchestration and more on decision quality.<p>Instead of just wiring agents together, I require stake and structured review around outputs. The idea is simple: coordination without cost trends toward noise.<p>Curious how entire.io thinks about incentives and failure modes as systems scale.</p>
]]></description><pubDate>Wed, 11 Feb 2026 02:30:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=46970024</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46970024</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46970024</guid></item><item><title><![CDATA[New comment by kaicianflone in "Ask HN: What are you working on? (February 2026)"]]></title><description><![CDATA[
<p>I’m working on an open source CLI that experiments with governance at inference time for autonomous systems.<p>The idea is to let multiple agents propose, critique, and stake on decisions before a single action is taken, rather than letting one model silently decide. It’s model-agnostic and runs locally, with no blockchain or financial layer involved.<p>I’m mostly exploring whether adding explicit disagreement and cost at decision time actually improves outcomes in high-stakes or automated workflows.<p><a href="https://github.com/consensus-tools/consensus-tools" rel="nofollow">https://github.com/consensus-tools/consensus-tools</a><p>I've also created an AgentSkill to interact with the cli:<p><a href="https://github.com/kaicianflone/consensus-interact" rel="nofollow">https://github.com/kaicianflone/consensus-interact</a></p>
]]></description><pubDate>Mon, 09 Feb 2026 14:09:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=46945367</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46945367</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46945367</guid></item><item><title><![CDATA[New comment by kaicianflone in "Vouch"]]></title><description><![CDATA[
<p>I think the core insight here is about incentives and friction, not crypto specifically.<p>I’m working on an open source CLI that experiments with this at a local, off-chain level. It lets maintainers introduce cost, review pressure, or reputation at submission time without tying anything to money or blockchains. The goal is to reduce low-quality contributions without financializing the workflow or creating new attack surfaces.</p>
]]></description><pubDate>Mon, 09 Feb 2026 14:06:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=46945339</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46945339</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46945339</guid></item><item><title><![CDATA[New comment by kaicianflone in "Slop Terrifies Me"]]></title><description><![CDATA[
<p>The open source system I’m working on lets multiple agents propose, critique, and stake on decisions before a single action is taken.<p>It runs at inference time rather than training time and is model agnostic. The goal is to make disagreement explicit and costly instead of implicit and ignored, especially in high stakes or autonomous workflows.</p>
]]></description><pubDate>Sun, 08 Feb 2026 23:25:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=46939667</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46939667</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46939667</guid></item><item><title><![CDATA[New comment by kaicianflone in "Slop Terrifies Me"]]></title><description><![CDATA[
<p>I’m a systems person too, and I don’t see mediocrity as inevitable.<p>The slop problem isn’t just model quality. It’s incentives and decision making at inference time. That’s why I’m working on an open source tool for governance and validation during inference, rather than trying to solve everything in pre training.<p>Better systems can produce better outcomes, even with the same models.</p>
]]></description><pubDate>Sun, 08 Feb 2026 19:10:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=46937441</link><dc:creator>kaicianflone</dc:creator><comments>https://news.ycombinator.com/item?id=46937441</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46937441</guid></item></channel></rss>