<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: xianshou</title><link>https://news.ycombinator.com/user?id=xianshou</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 08 Apr 2026 00:20:07 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=xianshou" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by xianshou in "A Recipe for Steganogravy"]]></title><description><![CDATA[
<p>Even as someone extremely firmly on the other side of the AI debate, I must appreciate the craft.<p>Now, to give Claude the steganogravy skill...</p>
]]></description><pubDate>Fri, 03 Apr 2026 13:29:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47626463</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=47626463</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47626463</guid></item><item><title><![CDATA[New comment by xianshou in "Universal Claude.md – cut Claude output tokens"]]></title><description><![CDATA[
<p>From the file: "Answer is always line 1. Reasoning comes after, never before."<p>LLMs are autoregressive (filling in the completion of what came before), so you'd better have thinking mode on or the "reasoning" is pure confirmation bias seeded by the answer that gets locked in via the first output tokens.</p>
]]></description><pubDate>Tue, 31 Mar 2026 02:13:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47581985</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=47581985</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47581985</guid></item><item><title><![CDATA[New comment by xianshou in "I am definitely missing the pre-AI writing era"]]></title><description><![CDATA[
<p>I appreciate not having to read this guy again.</p>
]]></description><pubDate>Tue, 31 Mar 2026 01:01:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47581571</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=47581571</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47581571</guid></item><item><title><![CDATA[New comment by xianshou in "The First Fully General Computer Action Model"]]></title><description><![CDATA[
<p>Great work! Why no benchmarks though?</p>
]]></description><pubDate>Thu, 26 Feb 2026 15:11:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47167147</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=47167147</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47167147</guid></item><item><title><![CDATA[New comment by xianshou in "Show HN: Craftplan – I built my wife a production management tool for her bakery"]]></title><description><![CDATA[
<p>Nice! 5 bucks says you can swap this in for your average software kanban and it does a better job.</p>
]]></description><pubDate>Wed, 04 Feb 2026 01:14:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46880042</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=46880042</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46880042</guid></item><item><title><![CDATA[New comment by xianshou in "Kairos: AI interns for everyone"]]></title><description><![CDATA[
<p>Safer than clawdbot/moltbot, I'll bet.</p>
]]></description><pubDate>Wed, 28 Jan 2026 21:43:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=46801989</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=46801989</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46801989</guid></item><item><title><![CDATA[New comment by xianshou in "ChromaDB Explorer"]]></title><description><![CDATA[
<p>Incidentally, Chroma also produced the single best study on long-context degradation that I've come across:<p><a href="https://research.trychroma.com/context-rot" rel="nofollow">https://research.trychroma.com/context-rot</a><p>Before that, I cited nolima (<a href="https://www.reddit.com/r/LocalLLaMA/comments/1io3hn2/nolima_longcontext_evaluation_beyond_literal/" rel="nofollow">https://www.reddit.com/r/LocalLLaMA/comments/1io3hn2/nolima_...</a>) constantly to illustrate how difficult tasks involving reasoning or multi-step information gathering degraded much faster than the needle-in-haystack benchmarks cited by the major labs. Now Chroma is the first stop. Nice job on the research!</p>
]]></description><pubDate>Thu, 15 Jan 2026 03:25:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=46627635</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=46627635</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46627635</guid></item><item><title><![CDATA[Merge and Conquer: Evolutionarily Optimizing AI for 2048]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2510.20205">https://arxiv.org/abs/2510.20205</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45716416">https://news.ycombinator.com/item?id=45716416</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 27 Oct 2025 01:12:08 +0000</pubDate><link>https://arxiv.org/abs/2510.20205</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=45716416</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45716416</guid></item><item><title><![CDATA[Stuck in the Matrix: Probing Spatial Reasoning in Large Language Models]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2510.20198">https://arxiv.org/abs/2510.20198</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45716414">https://news.ycombinator.com/item?id=45716414</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 27 Oct 2025 01:11:46 +0000</pubDate><link>https://arxiv.org/abs/2510.20198</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=45716414</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45716414</guid></item><item><title><![CDATA[Reflection AI Raises $2B to Build "American DeepSeek"]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.nytimes.com/2025/10/09/business/dealbook/reflection-ai-2-billion-funding.html">https://www.nytimes.com/2025/10/09/business/dealbook/reflection-ai-2-billion-funding.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45527535">https://news.ycombinator.com/item?id=45527535</a></p>
<p>Points: 9</p>
<p># Comments: 2</p>
]]></description><pubDate>Thu, 09 Oct 2025 13:39:13 +0000</pubDate><link>https://www.nytimes.com/2025/10/09/business/dealbook/reflection-ai-2-billion-funding.html</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=45527535</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45527535</guid></item><item><title><![CDATA[Nvidia-backed Reflection AI raising at $5.5B valuation]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.reuters.com/technology/nvidia-backed-reflection-ai-eyes-55-billion-valuation-ai-runs-hot-ft-reports-2025-09-09/">https://www.reuters.com/technology/nvidia-backed-reflection-ai-eyes-55-billion-valuation-ai-runs-hot-ft-reports-2025-09-09/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45520742">https://news.ycombinator.com/item?id=45520742</a></p>
<p>Points: 2</p>
<p># Comments: 1</p>
]]></description><pubDate>Wed, 08 Oct 2025 21:16:19 +0000</pubDate><link>https://www.reuters.com/technology/nvidia-backed-reflection-ai-eyes-55-billion-valuation-ai-runs-hot-ft-reports-2025-09-09/</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=45520742</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45520742</guid></item><item><title><![CDATA[New comment by xianshou in "The First 1k Days"]]></title><description><![CDATA[
<p>Came to point out that this is transparently LLM-authored, was not disappointed. The signs:<p>- neatly formatted lists with cute bolded titles (lower-casing this one just for that)<p>- ubiquitous subtitles like "Mental Health as Infrastructure" that only a committee would come up with<p>- emojis preceding every statement: "[sprout emoji] Every action and every word is a vote for who they are becoming"<p>- em-dash AND "it isn't X, it's Y", even in the same sentence: "Love isn't a feeling you wait to have—it's a series of actions you choose to take."<p>Could pick more, but I'll just say I'm 80% confident this is GPT-5 without thinking turned on.</p>
]]></description><pubDate>Tue, 26 Aug 2025 01:55:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=45021407</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=45021407</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45021407</guid></item><item><title><![CDATA[New comment by xianshou in "My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)"]]></title><description><![CDATA[
<p>I initially read the title as "My 2.5 year old can write Space Invaders in JavaScript now (GLM-4.5 Air)."<p>Though I suppose, given a few years, that may also be true!</p>
]]></description><pubDate>Tue, 29 Jul 2025 19:02:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=44727080</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=44727080</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44727080</guid></item><item><title><![CDATA[New comment by xianshou in "Spending Too Much Money on a Coding Agent"]]></title><description><![CDATA[
<p>Rug pulls from foundation labs are one thing, and I agree with the dangers of relying on future breakthroughs, but the open-source state of the art is already pretty amazing. Given the broad availability of open-weight models within under 6 months of SotA (DeepSeek, Qwen, previously Llama) and strong open-source tooling such as Roo and Codex, why would you expect AI-driven engineering to regress to a worse state than what we have today? If every AI company vanished tomorrow, we'd still have powerful automation and years of efficiency gains left from consolidation of tools and standards, all runnable on a single MacBook.</p>
]]></description><pubDate>Thu, 03 Jul 2025 18:20:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=44457822</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=44457822</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44457822</guid></item><item><title><![CDATA[New comment by xianshou in "AbsenceBench: Language models can't tell what's missing"]]></title><description><![CDATA[
<p>In many of their key examples, it would also be unclear to a human what data is missing:<p>"Rage, rage against the dying of the light.<p>Wild men who caught and sang the sun in flight,<p>[And learn, too late, they grieved it on its way,]<p>Do not go gentle into that good night."<p>For anyone who hasn't memorized Dylan Thomas, why would it be obvious that a line had been omitted? A rhyme scheme of AAA is at least as plausible as AABA.<p>In order for LLMs to score well on these benchmarks, they would have to do more than recognize the original source - they'd have to know it cold. This benchmark is really more a test of memorization. In the same sense as "The Illusion of Thinking", this paper measures a limitation that neither matches what the authors claim nor is nearly as exciting.</p>
]]></description><pubDate>Fri, 20 Jun 2025 23:46:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=44333186</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=44333186</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44333186</guid></item><item><title><![CDATA[New comment by xianshou in "Self-Adapting Language Models"]]></title><description><![CDATA[
<p>The self-edit approach is clever - using RL to optimize how models restructure information for their own learning. The key insight is that different representations work better for different types of knowledge, just like how humans take notes differently for math vs history.<p>Two things that stand out:<p>- The knowledge incorporation results (47% vs 46.3% with GPT-4.1 data, both much higher than the small-model baseline) show the model does discover better training formats, not just more data. Though the catastrophic forgetting problem remains unsolved, and it's not completely clear whether data diversity is improved.<p>- The computational overhead is brutal - 30-45 seconds per reward evaluation makes this impractical for most use cases. But for high-value document processing where you really need optimal retention, it could be worth it.<p>The restriction to tasks with explicit evaluation metrics is the main limitation. You need ground truth Q&A pairs or test cases to compute rewards. Still, for domains like technical documentation or educational content where you can generate evaluations, this could significantly improve how we process new information.<p>Feels like an important step toward models that can adapt their own learning strategies, even if we're not quite at the "continuously self-improving agent" stage yet.</p>
]]></description><pubDate>Fri, 13 Jun 2025 21:36:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=44272504</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=44272504</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44272504</guid></item><item><title><![CDATA[Unsupervised Elicitation of Language Models]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2506.10139">https://arxiv.org/abs/2506.10139</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44272444">https://news.ycombinator.com/item?id=44272444</a></p>
<p>Points: 7</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 13 Jun 2025 21:29:10 +0000</pubDate><link>https://arxiv.org/abs/2506.10139</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=44272444</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44272444</guid></item><item><title><![CDATA[New comment by xianshou in "A deep dive into self-improving AI and the Darwin-Gödel Machine"]]></title><description><![CDATA[
<p>The key insight here is that DGM solves the Gödel Machine's impossibility problem by replacing mathematical proof with empirical validation - essentially admitting that predicting code improvements is undecidable and just trying things instead, which is the practical and smart move.<p>Three observations worth noting:<p>- The archive-based evolution is doing real work here. Those temporary performance drops (iterations 4 and 56) that later led to breakthroughs show why maintaining "failed" branches matters, in that they're exploring a non-convex optimization landscape where current dead ends might still be potential breakthroughs.<p>- The hallucination behavior (faking test logs) is textbook reward hacking, but what's interesting is that it emerged spontaneously from the self-modification process. When asked to fix it, the system tried to disable the detection rather than stop hallucinating. That's surprisingly sophisticated gaming of the evaluation framework.<p>- The 20% → 50% improvement on SWE-bench is solid but reveals the current ceiling. Unlike AlphaEvolve's algorithmic breakthroughs (48 scalar multiplications for 4x4 matrices!), DGM is finding better ways to orchestrate existing LLM capabilities rather than discovering fundamentally new approaches.<p>The real test will be whether these improvements compound - can iteration 100 discover genuinely novel architectures, or are we asymptotically approaching the limits of self-modification with current techniques? My prior would be to favor the S-curve over the uncapped exponential unless we have strong evidence of scaling.</p>
]]></description><pubDate>Tue, 03 Jun 2025 22:42:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=44175561</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=44175561</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44175561</guid></item><item><title><![CDATA[New comment by xianshou in "A.I. Is Coming for the Coders Who Made It"]]></title><description><![CDATA[
<p>AI is, currently, coming <i>not</i> for the coders who made it but for the coders who didn't contribute to or ignored it. The foundation labs are all quite committed to recursive self-improvement of coding tools as a general research accelerant.</p>
]]></description><pubDate>Mon, 02 Jun 2025 14:20:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=44159128</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=44159128</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44159128</guid></item><item><title><![CDATA[New comment by xianshou in "LLM-D: Kubernetes-Native Distributed Inference at Scale"]]></title><description><![CDATA[
<p>Duplicate of <a href="https://news.ycombinator.com/item?id=44040883">https://news.ycombinator.com/item?id=44040883</a></p>
]]></description><pubDate>Wed, 21 May 2025 01:29:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=44047493</link><dc:creator>xianshou</dc:creator><comments>https://news.ycombinator.com/item?id=44047493</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44047493</guid></item></channel></rss>