Hacker News: mxwsn

New comment by mxwsn in "SWE-bench Verified no longer measures frontier coding capabilities"

mxwsn — Sun, 26 Apr 2026 18:26:35 +0000

How do you know that width scaling has been the driving force of improvement?

New comment by mxwsn in "Show HN: The Hessian of tall-skinny networks is easy to invert"

mxwsn — Thu, 15 Jan 2026 23:15:21 +0000

The Jacobian is first derivatives, but for a function mapping N to M dimensions. It's the first derivative of every output wrt every input, so it will be an N x M matrix.

The gradient is a special case of the Jacobian for functions mapping N to 1 dimension, such as loss functions. The gradient is an N x 1 vector.

New comment by mxwsn in "How does gradient descent work?"

mxwsn — Wed, 08 Oct 2025 04:12:04 +0000

Wow! The title suggests introductory material, but in my opinion this has strong potential to win test of time awards for research.

New comment by mxwsn in "Sora 2"

mxwsn — Wed, 01 Oct 2025 04:34:50 +0000

That's really interesting. What if they RAG search related videos from the prompt, and condition on that to generate? That might explain fidelity like this

New comment by mxwsn in "Diffusion Beats Autoregressive in Data-Constrained Settings"

mxwsn — Tue, 23 Sep 2025 03:23:12 +0000

Why is not the diffusion training objective? The technique is known as self-conditioning right? Is it an issue with conditional Tweedie's?

New comment by mxwsn in "AI is different"

mxwsn — Sat, 16 Aug 2025 02:57:03 +0000

AI with ability but without responsibility is not enough for dramatic socioeconomic change, I think. For now, the critical unique power of human workers is that you can hold them responsible for things.

edit: ability without accountability is the catchier motto :)

New comment by mxwsn in "Unlike ChatGPT, Anthropic has doubled down on Artifacts"

mxwsn — Wed, 16 Jul 2025 02:57:41 +0000

Has anyone come across any really cool artifacts? I'd be curious to see

New comment by mxwsn in "Web3 Onboarding Was a Flop – and Thank Goodness"

mxwsn — Mon, 07 Jul 2025 03:56:04 +0000

Stablecoins transferred $27 trillion in 2024 - more than Visa and Mastercard combined. This is right in the article.

Stablecoins operate using decentralized ledgers on e.g. Ethereum which use decentralized compute. This isn't mentioned explicitly because the target audience knows this already.

New comment by mxwsn in "Claude 4"

mxwsn — Thu, 22 May 2025 16:51:50 +0000

Gemini has beat it already, but using a different and notably more helpful harness. The creator has said they think harness design is the most important factor right now, and that the results don't mean much for comparing Claude to Gemini.

New comment by mxwsn in "The booming, high-stakes arms race of airline safety videos"

mxwsn — Mon, 07 Apr 2025 00:06:06 +0000

Huh, I imagined this was because of relaxing regulation.

New comment by mxwsn in "Deep Learning Is Not So Mysterious or Different"

mxwsn — Mon, 17 Mar 2025 18:04:17 +0000

Good read, thanks for sharing

New comment by mxwsn in "Some thoughts on autoregressive models"

mxwsn — Fri, 07 Mar 2025 07:25:52 +0000

> But what is the original purpose of AI research? I will speak for myself here, but I know many other AI researchers will say the same: the ultimate goal is to understand how humans think. And we think the best (or the funniest) way to understand how humans think is to try to recreate it.

Eh. To riff on Dijkstra, this is like submarine engineers saying their ultimate goal is to understand how fish swim.

New comment by mxwsn in "AI is stifling new tech adoption?"

mxwsn — Fri, 14 Feb 2025 20:01:19 +0000

This ought to be called the qwerty effect, for how the qwerty keyboard layout can't be usurped at this point. It was at the right place at the right time, even though arguably its main design choices are no longer relevant, and there are arguably better layouts like dvorak.

Python and React may similarly be enshrined for the future, for being at the right place at the right time.

English as a language might be another example.

New comment by mxwsn in "Mini-R1: Reproduce DeepSeek R1 "Aha Moment""

mxwsn — Fri, 31 Jan 2025 07:21:23 +0000

What's surprising about this is how sparsely defined the rewards are. Even if the model learns the formatting reward, if it never chances upon a solution, there isn't any feedback/reward to push it to learn to solve the game more often.

So what are the chances of randomly guessing a solution?

The toy Countdown dataset here has 3 to 4 numbers, which are combined with 4 symbols (+, -, x, ÷). With 3 numbers there are 3! * 4^3 = 384 possible symbol combinations, with 4 there are 6144. By the tensorboard log [0], even after just 10 learning steps, the model already has a success rate just below 10%. If we make the simplifying assumption that the model hasn't learned anything in 10 steps, then the probability of 1 (or more) success in 80 chances (8 generations are used per step), guessing randomly for a success rate of 1/384 on 3-number problems, is 1.9%. One interpretation is to take this as a p-value, and reject that the model's base success rate is completely random guessing - the base model already has slightly above chance success rate at solving the 3-number CountDown game.

This aligns with my intuition - I suspect that with proper prompting, LLMs should be able to solve CountDown decently OK without any training. Though maybe not a 3B model?

The model likely "parlays" its successes on 3 numbers to start to learn to solve 4 numbers. Or has it? The final learned ~50% success rate matches the frequency of 4-number problems in Jiayi Pan's CountDown dataset [1]. Phil does provide examples of successful 4-number solutions, but maybe the model hasn't become consistent at 4 numbers yet.

[0]: https://www.philschmid.de/static/blog/mini-deepseek-r1/tenso... [1]: https://huggingface.co/datasets/Jiayi-Pan/Countdown-Tasks-3t...

New comment by mxwsn in "I still like Sublime Text"

mxwsn — Wed, 29 Jan 2025 07:14:30 +0000

I used sublime from 2013 to 2021. It was great. Since, I've switched to VS Code and haven't looked back.

New comment by mxwsn in "AI and the Last Mile 2: Subsidiarity"

mxwsn — Thu, 28 Nov 2024 23:19:59 +0000

Context is a challenge for LLMs, but the challenge feels of a different quality to me, than the challenge of incorporating local context into automated decision-making AI like algorithmic hiring, banking decisions, and real estate valuation like Zillow. These examples are more like "pre-LLM" machine learning, and it's not clear to me that LLMs are inherently limited in the same way. If anything, I think there's potential for LLMs to more flexibly handle a much broader variety of local contextual information by ingesting natural language rather than non-LLM machine learning systems where how to featurize or represent this information is typically quite bespoke. Take the neighbors' practicing death metal in their garage every Sunday and its impact on house valuation - it feels harder to get a non-LLM ML system to "understand" this, as a very sparse "feature", than an LLM.

New comment by mxwsn in "Francois Chollet is leaving Google"

mxwsn — Thu, 14 Nov 2024 02:19:55 +0000

My interest was piqued, but the extrapolation in [1] is uh... not the most convincing. If there were more data points then sure, maybe

New comment by mxwsn in "LLMs Will Always Hallucinate, and We Need to Live with This"

mxwsn — Sat, 14 Sep 2024 17:14:18 +0000

OK - there's always a nonzero chance of hallucination. There's also a non-zero chance that macroscale objects can do quantum tunnelling, but no one is arguing that we "need to live with this" fact. A theoretical proof of the impossibility of reaching 0% probability of some event is nice, but in practice it says little about whether we can exponentially decrease the probability of it happening or not to effectively mitigate risk.

New comment by mxwsn in "Covering All Birthdays"

mxwsn — Thu, 01 Aug 2024 05:57:22 +0000

I think so. Parents can also make it happen at their convenience by asking doctors. We have technology to induce birth or control its timing over a few days

New comment by mxwsn in "How Does OpenAI Survive?"

mxwsn — Thu, 01 Aug 2024 04:12:28 +0000

This article is timely and pairs well with Sequoia's $600B question: https://www.sequoiacap.com/article/ais-600b-question/ calculated simply from NVidia run rate revenue, which is the cost that genAI companies are paying. Where's the profit?

Meta's open source LLM stance makes things more spicy, making it challenging for anyone generate differentiated and lasting profit in the LLM space.

At the current pace, the LLM bubble is poised to pop in a year or two - negative net revenue can't keep growing forever - barring a transformative, next-generation capability from closed-source AI companies that Meta can't replicate. All eyes on GPT-5.