<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: visarga</title><link>https://news.ycombinator.com/user?id=visarga</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 12 Apr 2026 22:55:46 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=visarga" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by visarga in "Anthropic downgraded cache TTL on March 6th"]]></title><description><![CDATA[
<p>I want to differentiate 2 kinds of harnesses<p>1. openclaw like - using the LLM endpoint on subscription billing, different prompts than claude code<p>2. using claude cli with -p, in headless mode<p>The second runs through their code and prompts, just calls claude in non-interactive mode for subtasks. I feel especially put off by restricting the second kind. I need it to run judge agents to review plans and code.</p>
]]></description><pubDate>Sun, 12 Apr 2026 17:47:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47742402</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47742402</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47742402</guid></item><item><title><![CDATA[New comment by visarga in "Anthropic downgraded cache TTL on March 6th"]]></title><description><![CDATA[
<p>I might consider switching to codex from claude pro 20x but I need the post tool use, pre file write and post user message hooks. Waiting on codex to deliver.<p>- pre file write -> block editing code files without a task and plan of work<p>- post tool use -> show next open checkbox in the task to the agent, like an instruction pointer<p>- post user message -> log all user messages for periodic review of intent alignment<p>These 3 hooks + plain md files make my claude harness.</p>
]]></description><pubDate>Sun, 12 Apr 2026 17:15:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47742095</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47742095</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47742095</guid></item><item><title><![CDATA[New comment by visarga in "Google open-sources experimental agent orchestration testbed Scion"]]></title><description><![CDATA[
<p>I made one similar harness, mine does lightweight sandboxing with Seatbelt on Mac and Bubblewrap on Linux. I initially used Docker too, but abandoned it. I like how these 2 sandboxes allow me to make all the file system r/o except the project folder which is r/w (and a few other config folders). This means my code runs inside the sandbox like outside, same paths hold, same file system. The .git folder is also r/o inside sandbox, only outside agent can commit. Sandboxing was intended to enable --yolo mode, I wanted to maximize autonomous time.<p>Work is divided into individual tasks. I could have used Plan Mode or TodoWriter tool to implement tasks - all agents have them nowadays. But instead I chose to plan in task.md files because they can be edited iteratively, start as a user request, develop into a plan with checkbox-able steps, the plan is reviewed by judge agent (in yolo mode, and fresh context), then worker agent solves gates. The gates enforce a workflow of testing soon, testing extensively. There is another implementation judge again in yolo mode. And at the end we update the memory/bootstrap document.<p>Task files go into the git repo. I also log all user messages and implement intent validation with the judge agents. The judges validate intent along the chain "chat -> task -> plan -> code -> tests". Nothing is lost, the project remembers and understands its history. In fact I like to run retrospective tasks where a task.md 'eats' previous tasks and produces a general project perspective not visible locally.<p>In my system everything is a md file, logged and versioned on git. You have no issue extracting your memories, in fact I made reflection on past work a primitive operation of this harness. I am using it for coding primarily, but it is just as good for deep research, literature reviews, organizing subject matter and tutoring me on topics, investment planning and orchestrating agent experiment loops like autoresearch. That is because the task.md is just a generic programming pipeline, gates are instructions in natural language, you can use it for any cognitive work. Longest task.md I ran was 700 steps, took hours to complete, but worked reliably.<p><a href="https://github.com/horiacristescu/claude-playbook-plugin" rel="nofollow">https://github.com/horiacristescu/claude-playbook-plugin</a></p>
]]></description><pubDate>Wed, 08 Apr 2026 05:06:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47685523</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47685523</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47685523</guid></item><item><title><![CDATA[New comment by visarga in "The threat is comfortable drift toward not understanding what you're doing"]]></title><description><![CDATA[
<p>> Whether that student walks out the door five years later as an independent thinker or a competent prompt engineer is, institutionally speaking, irrelevant.<p>I think this is a simplification, of course Bob relied on AI but they also used their own brain to think about the problem. Bob is not reducible to "a competent prompt engineer", if you think that just take any person who prompts unrelated to physics and ask them to do Bob's work.<p>In fact Bob might have a change to cover more mileage on the higher level of work while Alice does the same on the lower level. Which is better? It depends on how AI will evolve.<p>The article assumes the alternative to AI-assisted work is careful human work. I am not sure careful human work is all that good, or that it will scale well in the future. Better to rely on AI on top of careful human work.<p>My objection comes from remembering how senior devs review PRs ... "LGTM" .. it's pure vibes. If you are to seriously review a PR you have to run it, test it, check its edge cases, eval its performance - more work than making the PR itself. The entire history of software is littered with bugs that sailed through review because review is performative most of the time.<p>Anyone remember the verification crisis in science?</p>
]]></description><pubDate>Sun, 05 Apr 2026 14:52:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47650052</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47650052</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47650052</guid></item><item><title><![CDATA[New comment by visarga in "Components of a Coding Agent"]]></title><description><![CDATA[
<p>Yes, judges should not just look for bugs, they should also validate intent follow, but that can only happen when intent was preserved. I chose to save the user messages as a compromise, they are probably 10 or 100x smaller than full session. I think tasks themselves are one step lower than pure user intent. Anyway, if you didn't log user messages you can still recover them from session files if they have not been removed.<p>One interesting data point - I counted word count in my chat messages vs final code and they came out about 1:1, but in reality a programmer would type 10x the final code during development. From a different perspective I found I created 10x more projects since I relied on Claude and my harness than before. So it looks user intent is 10x more effective than manual coding now.</p>
]]></description><pubDate>Sun, 05 Apr 2026 14:39:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=47649936</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47649936</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47649936</guid></item><item><title><![CDATA[New comment by visarga in "12k AI-generated blog posts added in a single commit"]]></title><description><![CDATA[
<p>Humans are also unreliable, we are competing for scarce attention, platforms decide what gets visibility and we cater to their algorithms. You could say humans are prompted by feed ranking AI - what and how to publish.</p>
]]></description><pubDate>Sun, 05 Apr 2026 06:25:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47646643</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47646643</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47646643</guid></item><item><title><![CDATA[New comment by visarga in "Components of a Coding Agent"]]></title><description><![CDATA[
<p>My own approach also has intent sitting at the top: intent justifies plan justifies code justifies tests. And the other way around, tests satisfy code, satisfy plan, satisfy intent. These threads bottom up and top down are validated by judge agents.<p>I also make individual tasks md files (task.md) which makes them capable of carrying intent, plan, but not just checkbox driven "- [ ]" gates, they get annotated with outcomes, and become a workbook after execution. The same task.md is seen twice by judge agents which run without extra context, the plan judge and the implementation judge.<p>I ran tests to see which component of my harness contributes the most and it came out that it is the judges. Apparently claude code can solve a task with or without a task file just as well, but the existence of this task file makes plans and work more auditable, and not just for bugs, but for intent follow.<p>Coming back to user intent, I have a post user message hook that writes user messages to a project scoped chat_log.md file, which means all user messages are preserved (user text << agent text, it is efficient), when we start a new task the chat log is checked to see if intent was properly captured. I also use it to recover context across sessions and remember what we did last.<p>Once every 10-20 tasks I run a retrospective task that inspects all task.md files since last retro and judges how the harness performs and project goes. This can detect things not apparent in task level work, for example when using multiple tasks to implement a more complex feature, or when a subsystem is touched by multiple tasks. I think reflection is the one place where the harness itself and how we use it can be refined.<p><pre><code>    claude plugin marketplace add horiacristescu/claude-playbook-plugin

    source at https://github.com/horiacristescu/claude-playbook-plugin/tree/main</code></pre></p>
]]></description><pubDate>Sun, 05 Apr 2026 05:26:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47646352</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47646352</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47646352</guid></item><item><title><![CDATA[New comment by visarga in "Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw"]]></title><description><![CDATA[
<p>I built a harness where my plans and code are reviewed with 'claude -p' but most work is interactive, now it has been wrecked. I relied and integrated with Anthropic to get burned. I'm not even maxing out my plan, never surpassed 60%. But now I have to pay API pricing on top? This tells me how trustworthy Anthropic is. If you depend on any specific feature you are at their mercy.<p>Prior to Anthropic I have had bad experiences with Windsurf and Cursor, same shit - I pay the plan, they shrink my usage quota after a short time, couple of months or weeks. I never returned to Windsurf after they abused me, and never used Cursor after I got my Claude sub, I have no idea where I'll end up next. Too bad Anthropic is pushing my $200/mo away.</p>
]]></description><pubDate>Sat, 04 Apr 2026 16:29:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47640519</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47640519</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47640519</guid></item><item><title><![CDATA[New comment by visarga in "Marc Andreessen is wrong about introspection"]]></title><description><![CDATA[
<p>> In their minds, financial success is the ultimate yardstick.<p>In a loopy recursive way, it is. Cost gates what we can do and become. Paying back your costs to extend your runway is the working principle behind biology, economy and technology. I am not saying rich people are always right, just that cost is not so irrelevant to everything else. I personally think cost satisfaction explains multiple levels, from biology up.<p>Related to introspection - it certainly has a cost for doing it, and a cost for not doing it. Going happy go lucky is not necessarily optimal, experience was expensive to gain, not using it at all is a big loss. Being paralyzed by rumination is also not optimal, we have to act in time, we can't delay and if we do, it comes out differently.</p>
]]></description><pubDate>Fri, 03 Apr 2026 16:13:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47628506</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47628506</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47628506</guid></item><item><title><![CDATA[New comment by visarga in "Cursor 3"]]></title><description><![CDATA[
<p>I run Claude Code from Zed. Very nice experience.</p>
]]></description><pubDate>Thu, 02 Apr 2026 20:39:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47619876</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47619876</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47619876</guid></item><item><title><![CDATA[New comment by visarga in "Claude Code's source code has been leaked via a map file in their NPM registry"]]></title><description><![CDATA[
<p>Claude Code says thank you for reporting, I bet they will scan this chat to see what bugs they need to fix asap.</p>
]]></description><pubDate>Tue, 31 Mar 2026 16:52:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47590189</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47590189</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47590189</guid></item><item><title><![CDATA[New comment by visarga in "Google's 200M-parameter time-series foundation model with 16k context"]]></title><description><![CDATA[
<p>ARIMA and ARMA models</p>
]]></description><pubDate>Tue, 31 Mar 2026 06:07:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47583337</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47583337</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47583337</guid></item><item><title><![CDATA[New comment by visarga in "Do your own writing"]]></title><description><![CDATA[
<p>The cognitive benefit of writing comes from externalizing and evaluating ideas under friction. LLM conversation provides more friction per unit time than solo drafting because you're constantly reacting to a semi-competent interlocutor who gets it almost-right in ways that force you to articulate exactly where it went wrong.<p>I checked my logs and I write 10 words in chat for 1 word in LLM output for final text. So it's clearly not making me type less. I used to type about 10K words per month now I type 50-100K words per month (LLM chat is the difference).<p>The surplus capacity provided by LLMs got reinvested immediately in scope and depth expansion. I did not get to spend 10x less time writing.</p>
]]></description><pubDate>Tue, 31 Mar 2026 05:23:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47583056</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47583056</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47583056</guid></item><item><title><![CDATA[New comment by visarga in "Coding agents could make free software matter again"]]></title><description><![CDATA[
<p>>  it certainly feels against the spirit of what I intended when distributing my works<p>You can own the works, but not the vibes. If everyone owned the vibes we would all be infringing others. In my view abstractions should not be protected by copyright, only expression, currently the abstraction-filtration-comparison standard (AFC) protects abstractions too, non-literal infringement is a thing.<p>Trying to own the vibes is like trying to own the functionality itself, no matter the distinct implementation details, and this is closer to patents than copyrights. But patents get researched for prior art and have limited duration, copyright is automatic and almost infinite duration.</p>
]]></description><pubDate>Mon, 30 Mar 2026 03:38:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=47570144</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47570144</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47570144</guid></item><item><title><![CDATA[New comment by visarga in "Coding agents could make free software matter again"]]></title><description><![CDATA[
<p>I've been saying LLMs are more open than open source for some time...</p>
]]></description><pubDate>Mon, 30 Mar 2026 03:34:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=47570121</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47570121</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47570121</guid></item><item><title><![CDATA[New comment by visarga in "ARC-AGI-3"]]></title><description><![CDATA[
<p>ARC is trying to isolate a unitary intelligence signal, so it strips away coordination, specialization, and division of labor. But that also means it removes one of the dominant mechanisms by which intelligence actually scales in the real world. Their view on intelligence implicitly treats redundancy as necessary - one agent must do them all - and treats efficiency as something achieved internally rather than through restructuring the system. At the very least they should create environments that can help an agent compound intelligence, to self amplify, support itself, that is not happening in ARC.<p>Anyone wondered if ARC is a measure of intelligence or just a collection of hand picked tasks? was there a proof they encode anything meaningful about intelligence in such short tasks in miniature environments? One shot intelligence?</p>
]]></description><pubDate>Thu, 26 Mar 2026 18:48:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47534190</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47534190</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47534190</guid></item><item><title><![CDATA[New comment by visarga in "ARC-AGI-3"]]></title><description><![CDATA[
<p>It only tests puzzle solving, intelligence is cost compression that powers itself.</p>
]]></description><pubDate>Thu, 26 Mar 2026 18:03:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47533701</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47533701</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47533701</guid></item><item><title><![CDATA[New comment by visarga in "Personal Encyclopedias"]]></title><description><![CDATA[
<p>That is because platforms both enable us and exploit us, they exploit both those who create/comment and those who read. They perform a necessary function but extract the value from it for their own good.</p>
]]></description><pubDate>Thu, 26 Mar 2026 17:02:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=47532916</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47532916</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47532916</guid></item><item><title><![CDATA[New comment by visarga in "Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon"]]></title><description><![CDATA[
<p>But across a sequence you still have to load most of them.</p>
]]></description><pubDate>Tue, 24 Mar 2026 17:52:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47506561</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47506561</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47506561</guid></item><item><title><![CDATA[New comment by visarga in "Show HN: Atomic – Self-hosted, semantically-connected personal knowledge base"]]></title><description><![CDATA[
<p>I did something similar, markdown and code agents for memory, multiple feeds for intake, also my own browsing and claude cli messages get indexed.</p>
]]></description><pubDate>Sun, 22 Mar 2026 06:34:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47475015</link><dc:creator>visarga</dc:creator><comments>https://news.ycombinator.com/item?id=47475015</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47475015</guid></item></channel></rss>