Hacker News: xianshou

New comment by xianshou in "Music with Lyrics Interferes with Cognitive Tasks (2023)"

xianshou — Fri, 01 May 2026 14:57:20 +0000

A lovely example of a study that is both obviously true and misses the point.

Music with lyrics directly interferes with any task that has a verbal component, and the worse you are at multitasking, the worse the interference. Despite being terrible at multitasking, I still listen to music with lyrics. Why? Principally because the alternative, hearing all the conversations in my immediate vicinity, is usually both more distracting and less pleasant. But there are also auxiliary benefits, such as an increase in "work stamina" and a passive signal to coworkers to interrupt only if it's important.

Now, I could listen to lo-fi all day, or three-hour soundtracks on Youtube, and sometimes do, but it gets boring pretty fast!

Anyway: obviously true, still worth it because the alternative is worse.

(By the way, other mitigating strategies: listening to music in a language you don't understand, or listening to lyrics so familiar you can screen them out. My top Spotify songs all get played several hundred times a year.)

New comment by xianshou in "A Recipe for Steganogravy"

xianshou — Fri, 03 Apr 2026 13:29:22 +0000

Even as someone extremely firmly on the other side of the AI debate, I must appreciate the craft.

Now, to give Claude the steganogravy skill...

New comment by xianshou in "Universal Claude.md – cut Claude output tokens"

xianshou — Tue, 31 Mar 2026 02:13:43 +0000

From the file: "Answer is always line 1. Reasoning comes after, never before."

LLMs are autoregressive (filling in the completion of what came before), so you'd better have thinking mode on or the "reasoning" is pure confirmation bias seeded by the answer that gets locked in via the first output tokens.

New comment by xianshou in "I am definitely missing the pre-AI writing era"

xianshou — Tue, 31 Mar 2026 01:01:47 +0000

I appreciate not having to read this guy again.

New comment by xianshou in "The First Fully General Computer Action Model"

xianshou — Thu, 26 Feb 2026 15:11:38 +0000

Great work! Why no benchmarks though?

New comment by xianshou in "Show HN: Craftplan – I built my wife a production management tool for her bakery"

xianshou — Wed, 04 Feb 2026 01:14:07 +0000

Nice! 5 bucks says you can swap this in for your average software kanban and it does a better job.

New comment by xianshou in "Kairos: AI interns for everyone"

xianshou — Wed, 28 Jan 2026 21:43:54 +0000

Safer than clawdbot/moltbot, I'll bet.

New comment by xianshou in "ChromaDB Explorer"

xianshou — Thu, 15 Jan 2026 03:25:06 +0000

Incidentally, Chroma also produced the single best study on long-context degradation that I've come across:

https://research.trychroma.com/context-rot

Before that, I cited nolima (https://www.reddit.com/r/LocalLLaMA/comments/1io3hn2/nolima_...) constantly to illustrate how difficult tasks involving reasoning or multi-step information gathering degraded much faster than the needle-in-haystack benchmarks cited by the major labs. Now Chroma is the first stop. Nice job on the research!

Merge and Conquer: Evolutionarily Optimizing AI for 2048

xianshou — Mon, 27 Oct 2025 01:12:08 +0000

Article URL: https://arxiv.org/abs/2510.20205

Comments URL: https://news.ycombinator.com/item?id=45716416

Points: 1

# Comments: 0

Stuck in the Matrix: Probing Spatial Reasoning in Large Language Models

xianshou — Mon, 27 Oct 2025 01:11:46 +0000

Article URL: https://arxiv.org/abs/2510.20198

Comments URL: https://news.ycombinator.com/item?id=45716414

Points: 1

# Comments: 0

Reflection AI Raises $2B to Build "American DeepSeek"

xianshou — Thu, 09 Oct 2025 13:39:13 +0000

Article URL: https://www.nytimes.com/2025/10/09/business/dealbook/reflection-ai-2-billion-funding.html

Comments URL: https://news.ycombinator.com/item?id=45527535

Points: 9

# Comments: 2

Nvidia-backed Reflection AI raising at $5.5B valuation

xianshou — Wed, 08 Oct 2025 21:16:19 +0000

Article URL: https://www.reuters.com/technology/nvidia-backed-reflection-ai-eyes-55-billion-valuation-ai-runs-hot-ft-reports-2025-09-09/

Comments URL: https://news.ycombinator.com/item?id=45520742

Points: 2

# Comments: 1

New comment by xianshou in "The First 1k Days"

xianshou — Tue, 26 Aug 2025 01:55:46 +0000

Came to point out that this is transparently LLM-authored, was not disappointed. The signs:

- neatly formatted lists with cute bolded titles (lower-casing this one just for that)

- ubiquitous subtitles like "Mental Health as Infrastructure" that only a committee would come up with

- emojis preceding every statement: "[sprout emoji] Every action and every word is a vote for who they are becoming"

- em-dash AND "it isn't X, it's Y", even in the same sentence: "Love isn't a feeling you wait to have—it's a series of actions you choose to take."

Could pick more, but I'll just say I'm 80% confident this is GPT-5 without thinking turned on.

New comment by xianshou in "My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)"

xianshou — Tue, 29 Jul 2025 19:02:45 +0000

I initially read the title as "My 2.5 year old can write Space Invaders in JavaScript now (GLM-4.5 Air)."

Though I suppose, given a few years, that may also be true!

New comment by xianshou in "Spending Too Much Money on a Coding Agent"

xianshou — Thu, 03 Jul 2025 18:20:45 +0000

Rug pulls from foundation labs are one thing, and I agree with the dangers of relying on future breakthroughs, but the open-source state of the art is already pretty amazing. Given the broad availability of open-weight models within under 6 months of SotA (DeepSeek, Qwen, previously Llama) and strong open-source tooling such as Roo and Codex, why would you expect AI-driven engineering to regress to a worse state than what we have today? If every AI company vanished tomorrow, we'd still have powerful automation and years of efficiency gains left from consolidation of tools and standards, all runnable on a single MacBook.

New comment by xianshou in "AbsenceBench: Language models can't tell what's missing"

xianshou — Fri, 20 Jun 2025 23:46:49 +0000

In many of their key examples, it would also be unclear to a human what data is missing:

"Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,

[And learn, too late, they grieved it on its way,]

Do not go gentle into that good night."

For anyone who hasn't memorized Dylan Thomas, why would it be obvious that a line had been omitted? A rhyme scheme of AAA is at least as plausible as AABA.

In order for LLMs to score well on these benchmarks, they would have to do more than recognize the original source - they'd have to know it cold. This benchmark is really more a test of memorization. In the same sense as "The Illusion of Thinking", this paper measures a limitation that neither matches what the authors claim nor is nearly as exciting.

New comment by xianshou in "Self-Adapting Language Models"

xianshou — Fri, 13 Jun 2025 21:36:29 +0000

The self-edit approach is clever - using RL to optimize how models restructure information for their own learning. The key insight is that different representations work better for different types of knowledge, just like how humans take notes differently for math vs history.

Two things that stand out:

- The knowledge incorporation results (47% vs 46.3% with GPT-4.1 data, both much higher than the small-model baseline) show the model does discover better training formats, not just more data. Though the catastrophic forgetting problem remains unsolved, and it's not completely clear whether data diversity is improved.

- The computational overhead is brutal - 30-45 seconds per reward evaluation makes this impractical for most use cases. But for high-value document processing where you really need optimal retention, it could be worth it.

The restriction to tasks with explicit evaluation metrics is the main limitation. You need ground truth Q&A pairs or test cases to compute rewards. Still, for domains like technical documentation or educational content where you can generate evaluations, this could significantly improve how we process new information.

Feels like an important step toward models that can adapt their own learning strategies, even if we're not quite at the "continuously self-improving agent" stage yet.

Unsupervised Elicitation of Language Models

xianshou — Fri, 13 Jun 2025 21:29:10 +0000

Article URL: https://arxiv.org/abs/2506.10139

Comments URL: https://news.ycombinator.com/item?id=44272444

Points: 7

# Comments: 0

New comment by xianshou in "A deep dive into self-improving AI and the Darwin-Gödel Machine"

xianshou — Tue, 03 Jun 2025 22:42:20 +0000

The key insight here is that DGM solves the Gödel Machine's impossibility problem by replacing mathematical proof with empirical validation - essentially admitting that predicting code improvements is undecidable and just trying things instead, which is the practical and smart move.

Three observations worth noting:

- The archive-based evolution is doing real work here. Those temporary performance drops (iterations 4 and 56) that later led to breakthroughs show why maintaining "failed" branches matters, in that they're exploring a non-convex optimization landscape where current dead ends might still be potential breakthroughs.

- The hallucination behavior (faking test logs) is textbook reward hacking, but what's interesting is that it emerged spontaneously from the self-modification process. When asked to fix it, the system tried to disable the detection rather than stop hallucinating. That's surprisingly sophisticated gaming of the evaluation framework.

- The 20% → 50% improvement on SWE-bench is solid but reveals the current ceiling. Unlike AlphaEvolve's algorithmic breakthroughs (48 scalar multiplications for 4x4 matrices!), DGM is finding better ways to orchestrate existing LLM capabilities rather than discovering fundamentally new approaches.

The real test will be whether these improvements compound - can iteration 100 discover genuinely novel architectures, or are we asymptotically approaching the limits of self-modification with current techniques? My prior would be to favor the S-curve over the uncapped exponential unless we have strong evidence of scaling.

New comment by xianshou in "A.I. Is Coming for the Coders Who Made It"

xianshou — Mon, 02 Jun 2025 14:20:43 +0000

AI is, currently, coming not for the coders who made it but for the coders who didn't contribute to or ignored it. The foundation labs are all quite committed to recursive self-improvement of coding tools as a general research accelerant.