<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: 2001zhaozhao</title><link>https://news.ycombinator.com/user?id=2001zhaozhao</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 14 Apr 2026 17:43:27 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=2001zhaozhao" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by 2001zhaozhao in "Why Your "AI-First" Strategy Is Probably Wrong"]]></title><description><![CDATA[
<p>This is a very information-dense post. Took some time to read it in detail. Here's my thoughts.<p>> OpenAI published a concept in February 2026 that captured what we'd been doing. They called it harness engineering: the primary job of an engineering team is no longer writing code. It is enabling agents to do useful work. When something fails, the fix is never "try harder." The fix is: what capability is missing, and how do we make it legible and enforceable for the agent?<p>This is what I've at least suspected for a while from working on my personal projects. Thanks for laying it out in clear terms.<p>> A production system needs to be stable, reliable, and secure. You need a system that can guarantee those properties when AI writes the code. You build the system. The prompts are disposable.<p>I agree, and the implication is that the primary bottleneck in any engineering project today is actually AI workflow design more than anything else, even working on the project itself. Because having the right AI workflow/scaffold/process lets you 10x the productivity of everything else on the project while keeping things production-ready, and keeping things production-ready is really hard.<p>> The Product Management Bottleneck<p>> The QA Bottleneck<p>So now, not only do devs become software architects that design dev processes and high-level direction more than doing the development themselves, PM and QA also need to become PM/QA architects that design PM/QA processes and product direction to stay relevant. lol.<p>> The Headcount Bottleneck<p>I think it's still an unsolved problem whether AI will reduce cooperation bottleneck between people (through new cooperation technologies like knowledge consolidation, and AI-driven performance measurement which is harder to game) or increase the bottleneck (through deep individual knowledge becoming more important since everyone is an architect). I'd guess the latter for the short term and possibly the former for the long term.<p>> I had to unify all the code into a single monorepo. One reason: so AI could see everything.<p>I wonder whether it's better for Git history cleanliness purposes to do one of the following instead:<p>- Use a "hub" monorepo that uses Git submodules to link to all the other repos in the project. The hub repo contains documentation and AI agent configurations, but the individual project files stay in their respective repos.<p>- Use an agent harness system that natively wraps over multiple repositories. (More precisely, it would make a temp folder and put the worktrees of multiple repos in that folder. Perhaps it can unpack some documentation and AI agent configs in the root too, with the root repository simply gitignore-ing the individual repo folders instead of using submodules.)<p>> Every pull request triggers three parallel AI review passes using Claude Opus 4.6:
Pass 1: Code quality. Logic errors, performance issues, maintainability.
Pass 2: Security. Vulnerability scanning, authentication boundary checks, injection risks.
Pass 3: Dependency scan. Supply chain risks, version conflicts, license issues.<p>I agree that automated PR review with AI agents is very important. Good list of topics, I think this will help with my own implementation.<p>> One hour later, the triage engine runs. It clusters production errors from CloudWatch and Sentry, scores each cluster across nine severity dimensions, and auto-generates investigation tickets in Linear. Each ticket includes sample logs, affected users, affected endpoints, and suggested investigation paths.<p>This is cool, advanced stuff. Though I kind of think that instead of Linear, we need an AI-centric ticketing system designed from the ground up to make it easier for AIs to handle the tickets and for the humans to monitor said AIs. I've used some AI coding kanban board tools and found them to be very helpful (compared to using a separate Forgejo kanban board + AI agent), and maybe a more general AI-powered ticket management tool would be the next step.<p>> Each tool handles one phase. No tool tries to do everything.<p>I think the key is to have separate agents handling each phase. They could all be in the same tool. I agree that having one AI agent handle the entire thing isn't going to be enough for the kind of reliability one is looking for here.<p>> Graphite's merge queue rebases, re-runs CI, merges if green.<p>This is a tool I hadn't heard of before and the merge queue seems like a very useful concept. I wonder if it handles automatically resolving trivial rebase conflicts with AI. The stacked PR feature sounds pretty good too.<p>> People assume we're trading quality for speed. User engagement went up. Payment conversion went up. We produce better results than before, because the feedback loops are tighter. You learn more when you ship daily than when you ship monthly.<p>Obviously lofty claims but intuitively I think this is possible. AI output isn't perfect but current engineering teams are far from perfect either. And I think AI is more amenable to process design than people are, simply because you can change the AI prompt instantly (and perhaps even AB-test it with LLMs as judge?) but people need time to train for a new process.<p>> At CREAO, we pushed AI-native operations into every function:
Product release notes: AI-generated from changelogs and feature descriptions.
Feature intro videos: AI-generated motion graphics.
Daily posts on socials: AI-orchestrated and auto-published.
Health reports and analytics summaries: AI-generated from CloudWatch and production databases.<p>Using AI for public-facing announcements is a bit of a minefield to be honest. I think it's valuable to have knowledgeable humans do most of this. But maybe AI can be acceptable if you clearly label that it's AI and you genuinely don't have the human bandwidth to do it anymore.<p>> I believe one-person companies will become common. If one architect with agents can do the work of 100 people, many companies won't need a second employee.<p>Oh boy.<p>> the CTO working 18-hour days<p>This is actually the least believable part of this post to me. I'd somewhat believe if you said 14-16 hours, but working a 18 hour day seems like a straight up bad idea. Even assuming you value absolutely nothing else in life other than work, you'd get more done in 14-16 hrs w/ more leisure and sleep than in 18 without it.</p>
]]></description><pubDate>Tue, 14 Apr 2026 00:12:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47759621</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47759621</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47759621</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "I ran Gemma 4 as a local model in Codex CLI"]]></title><description><![CDATA[
<p>I think it might be a good idea to make some kind of local-first harness that is designed to fully saturate some local hardware churning experiments on Gemma 4 (or another local model) 24/7 and only occasionally calls Claude Opus for big architectural decisions and hard-to-fix bugs.<p>Something like:<p>* Human + Claude Opus sets up project direction and identifies research experiments that can be performed by a local model<p>* Gemma 4 on local hardware autonomously performs smaller research experiments / POCs, including autonomous testing and validation steps that burn a lot of tokens but can convincingly prove that the POC works. This is automatically scheduled to fully utilize the local hardware. There might even be a prioritization system to make these POC experiments only run when there's no more urgent request on the local hardware. The local model has an option to call Opus if it's truly stuck on a task.<p>* Once an approach is proven through the experimentation, human works with Opus to implement into main project from scratch<p>If you can get a complex harness to work on models of this weight-class paired with the right local hardware (maybe your old gaming GPU plus 32gb of RAM), you can churn through millions of output tokens a day (and probably like ~100 million input tokens though the vast majority are cached). The main cost advantage compared to cloud models is actually that you have total control over prompt caching locally which makes it basically free, whereas most API providers for small LLM models ask for full price for input tokens even if the prompt is exactly repeated across every request.</p>
]]></description><pubDate>Mon, 13 Apr 2026 19:41:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47756907</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47756907</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47756907</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Someone bought 30 WordPress plugins and planted a backdoor in all of them"]]></title><description><![CDATA[
<p>That there is a preexisting way for people to get hacked doesn't seem to be a reason to dismiss other, new ways for people to get hacked.</p>
]]></description><pubDate>Mon, 13 Apr 2026 19:38:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47756878</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47756878</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47756878</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Launch HN: Twill.ai (YC S25) – Delegate to cloud agents, get back PRs"]]></title><description><![CDATA[
<p>I'm speaking from my daily experience. Sometimes i don't want to close my laptop before going to bed because there are still 1-2 tasks ongoing in my AI kanban board, so I just leave my laptop open (lock but not suspend it) so that the agents keep working for a while. I don't even have things all that automated.<p>I anticipate that once I have some more complex agentic scaffolds set up to do things like automatically explore promising directions for the project, then leaving the AI system on overnight becomes a necessity.</p>
]]></description><pubDate>Fri, 10 Apr 2026 20:36:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47723372</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47723372</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47723372</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Launch HN: Twill.ai (YC S25) – Delegate to cloud agents, get back PRs"]]></title><description><![CDATA[
<p>I would point out that a beefy desktop is probably faster at compiling code than a typical cloud instance simply due to more CPU performance. So maybe up to 10-ish concurrent agents it's faster to use a local desktop than a cloud instance, and then you start to get into the territory where multiple agents are compiling code at the same time, and the cloud setup starts to win. (That's assuming the codebase takes a while to compile and pegs your CPU at 100% while doing so. If the codebase is faster to compile or uses fewer threads, then the breakeven agent count is even higher.)<p>Other than that, I agree with what you said. I don't know what the tradeoffs for local on-premises and cloud agents are in terms of other areas like convenience, but I do think that scalability in the cloud is a big advantage.</p>
]]></description><pubDate>Fri, 10 Apr 2026 19:30:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47722605</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47722605</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47722605</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Launch HN: Twill.ai (YC S25) – Delegate to cloud agents, get back PRs"]]></title><description><![CDATA[
<p>24/7 running coding agents are pretty clearly the direction the industry is going now. I think we'll need either on-premises or cloud solutions, since obviously if you need an agent to run 24/7 then it can't live on your laptop.<p>Obviously cloud is better for making money, and some kind of VPC or local cloud solution is best for enterprise, but perhaps for individual devs, a self-hosted system on a home desktop computer running 24/7 (hybrid desktop / server) would be the best solution?</p>
]]></description><pubDate>Fri, 10 Apr 2026 18:53:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47722143</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47722143</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47722143</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Instant 1.0, a backend for AI-coded apps"]]></title><description><![CDATA[
<p>Wow, the demo-in-a-blogpost is really impressive.</p>
]]></description><pubDate>Fri, 10 Apr 2026 07:40:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47714841</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47714841</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47714841</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "ChatGPT Pro now starts at $100/month"]]></title><description><![CDATA[
<p>The title is misleading. The only thing they seem to have done was add a $100 plan identical to Claude's, which gives 5x usage of ChatGPT Plus. There is still a $200 plan that gives 20x usage.</p>
]]></description><pubDate>Thu, 09 Apr 2026 18:16:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47707368</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47707368</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47707368</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Claude Managed Agents"]]></title><description><![CDATA[
<p>it's probably not, they are hiking the price to 5x for companies with access to it. (or 1.67x of Opus 4.1)</p>
]]></description><pubDate>Thu, 09 Apr 2026 00:00:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47697718</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47697718</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47697718</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Muse Spark: Scaling towards personal superintelligence"]]></title><description><![CDATA[
<p>The "AIME Evolution" graph seems interesting. I wonder if other labs are doing this too to improve the reasoning performance of their models.<p>> Think longer to solve harder problems
> Compress
> Think longer again</p>
]]></description><pubDate>Wed, 08 Apr 2026 22:18:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47697014</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47697014</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47697014</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "System Card: Claude Mythos Preview [pdf]"]]></title><description><![CDATA[
<p>It's pretty crazy watching AI 2027 slowly but surely come true. What a world we now live in.<p>SWE-bench verified going from 80%-93% in particular sounds extremely significant given that the benchmark was previously considered pretty saturated and stayed in the 70-80% range for several generations. There must have been some insane breakthrough here akin to the jump from non-reasoning to reasoning models.<p>Regarding the cyberattack capabilities, I think Anthropic might now need to ban even advanced defensive cybersecurity use for the models for the public before releasing it (so people can't trick them to attack others' systems under the pretense of pentesting). Otherwise we'll get a huge problem with people using them to hack around the internet.</p>
]]></description><pubDate>Tue, 07 Apr 2026 22:00:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47681884</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47681884</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47681884</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw"]]></title><description><![CDATA[
<p>There are going to be a lot of tools coming soon that are "agent-agnostic", i.e. can run on CLIs including Claude Code. I am personally experimenting with using a combo of MCP + custom UI layer to provide custom tools with bespoke UX and thus turn Claude Code (or any other CLI agent for that matter) into whatever I want. I wonder how they'll deal with that.<p>For a good existing example developed by a known company, check Cline Kanban: <a href="https://cline.bot/kanban" rel="nofollow">https://cline.bot/kanban</a><p>They don't have the MCP-bundling idea that I'm experimenting with, however.</p>
]]></description><pubDate>Fri, 03 Apr 2026 23:27:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=47633682</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47633682</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47633682</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "AMD's Ryzen 9 9950X3D2 Dual Edition crams 208MB of cache into a single chip"]]></title><description><![CDATA[
<p>I don't really see a huge reason to buy this other than it being a top-tier halo product.<p>For gaming, AMD already pins the game threads to the CCD with the extra cache pretty well.<p>For multi-threaded workloads the gain from having cache on both CCDs is quite small.</p>
]]></description><pubDate>Sat, 28 Mar 2026 08:20:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=47552629</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47552629</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47552629</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "AMD's Ryzen 9 9950X3D2 Dual Edition crams 208MB of cache into a single chip"]]></title><description><![CDATA[
<p>Yeah the only way to run 4 sticks of DDR5 decently is with Intel. It's a bit of a shame that you can't cram enough RAM to run big models.<p>The most I could get running on 10GB VRAM + 96GB RAM was a REAP'd + quantized version of MiniMax-M2.5</p>
]]></description><pubDate>Sat, 28 Mar 2026 08:17:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=47552616</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47552616</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47552616</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "HyperAgents: Self-referential self-improving agents"]]></title><description><![CDATA[
<p>AGI-MegaAgent 5.7 Pro Ultra</p>
]]></description><pubDate>Thu, 26 Mar 2026 20:40:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47535424</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47535424</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47535424</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Slowing Down in the Age of Coding Agents"]]></title><description><![CDATA[
<p>> I usually have one or two tasks running, but I don't feel pressure to maximize parallelism. Most of my brain is already occupied with thinking hard about the shape of what I'm building — and that work can't be parallelized. I can't do more; anything on top would encure a task switching cost.<p>Maybe there should be new agent UI's that help you get the most out of a singular work stream.<p>For example being able to spin up an asynchronous agent in a side window that can iterate with you on ideas while the main agent works, and then send any outputs to the main agent easily.</p>
]]></description><pubDate>Tue, 24 Mar 2026 22:54:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47510707</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47510707</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47510707</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Goodbye to Sora"]]></title><description><![CDATA[
<p>We need a 'killed by OpenAI' site now</p>
]]></description><pubDate>Tue, 24 Mar 2026 22:24:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47510365</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47510365</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47510365</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Goodbye to Sora"]]></title><description><![CDATA[
<p>They wanted network effects because ChatGPT was sorely lacking any.<p>I actually thought the Sora app was promising at launch, at least on paper, but it seems like they failed to keep people's attention long term. With the failure of Sora i don't think they have good options left.</p>
]]></description><pubDate>Tue, 24 Mar 2026 22:22:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47510335</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47510335</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47510335</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Missile defense is NP-complete"]]></title><description><![CDATA[
<p>From my extremely uneducated point of view it seems like that is true and probably what is already happening in Ukraine. However, at some point robots might be able to take and hold ground, and maybe they can be designed to require only decentralized, automated infrastructure to operate that is hard to strike economically even with drones. At that point, may the side with the most robots win.</p>
]]></description><pubDate>Tue, 24 Mar 2026 22:17:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47510272</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47510272</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47510272</guid></item><item><title><![CDATA[New comment by 2001zhaozhao in "Epic Games to cut more than 1k jobs as Fortnite usage falls"]]></title><description><![CDATA[
<p>> pulling a Valve<p>> Video games are the exact opposite of Infinite Growth Forever. People get bored and move on.<p>To me, Epic Games were clearly trying to "pull a Valve" and capture the platform magic that allows Valve and other platforms like Roblox to be sustainably profitable. Obviously they have their own game store, but they also have a Fortnite Creative / UEFN (Unreal Editor for Fortnite) platform where people can create minigames inside Fortnite that work similarly to Roblox.<p>They even had the right idea for a while - refusing in-app transactions in their Fortnite Creative platform to encourage actually fun games rather than greedy games that prey on players. Unfortunately they had to walk back that system recently, which I now assume to be for the same financial reason as this new layoff.<p>I think their idea didn't work for two reasons. First, they locked down the UEFN platform too hard, leaving not a lot of options for developers to modify core gameplay features like movement and player controller. Devs like me who wanted more control over the player character and game mechanics were really severely restricted - if it was intentional, it was a bad call, and if it was unintentional then it shows that UEFN was too half-baked technically when they launched it. Second, Fortnite already had the reputation of being "just that Battle Royale game", so people didn't innovate too far beyond the game's base gameplay, rather than Roblox which was more like a true game engine / platform where every genre was possible. This kind of doomed their plan to compete head-to-head with Roblox from the start.</p>
]]></description><pubDate>Tue, 24 Mar 2026 22:03:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47510099</link><dc:creator>2001zhaozhao</dc:creator><comments>https://news.ycombinator.com/item?id=47510099</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47510099</guid></item></channel></rss>