<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mxwsn</title><link>https://news.ycombinator.com/user?id=mxwsn</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 27 Apr 2026 17:48:16 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mxwsn" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mxwsn in "SWE-bench Verified no longer measures frontier coding capabilities"]]></title><description><![CDATA[
<p>How do you know that width scaling has been the driving force of improvement?</p>
]]></description><pubDate>Sun, 26 Apr 2026 18:26:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47912574</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=47912574</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47912574</guid></item><item><title><![CDATA[New comment by mxwsn in "Show HN: The Hessian of tall-skinny networks is easy to invert"]]></title><description><![CDATA[
<p>The Jacobian <i>is</i> first derivatives, but for a function mapping N to M dimensions. It's the first derivative of every output wrt every input, so it will be an N x M matrix.<p>The gradient is a special case of the Jacobian for functions mapping N to 1 dimension, such as loss functions. The gradient is an N x 1 vector.</p>
]]></description><pubDate>Thu, 15 Jan 2026 23:15:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=46640794</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=46640794</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46640794</guid></item><item><title><![CDATA[New comment by mxwsn in "How does gradient descent work?"]]></title><description><![CDATA[
<p>Wow! The title suggests introductory material, but in my opinion this has strong potential to win test of time awards for research.</p>
]]></description><pubDate>Wed, 08 Oct 2025 04:12:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=45511981</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=45511981</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45511981</guid></item><item><title><![CDATA[New comment by mxwsn in "Sora 2"]]></title><description><![CDATA[
<p>That's really interesting. What if they RAG search related videos from the prompt, and condition on that to generate? That might explain fidelity like this</p>
]]></description><pubDate>Wed, 01 Oct 2025 04:34:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=45434262</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=45434262</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45434262</guid></item><item><title><![CDATA[New comment by mxwsn in "Diffusion Beats Autoregressive in Data-Constrained Settings"]]></title><description><![CDATA[
<p>Why is not the diffusion training objective? The technique is known as self-conditioning right? Is it an issue with conditional Tweedie's?</p>
]]></description><pubDate>Tue, 23 Sep 2025 03:23:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=45342496</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=45342496</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45342496</guid></item><item><title><![CDATA[New comment by mxwsn in "AI is different"]]></title><description><![CDATA[
<p>AI with ability but without responsibility is not enough for dramatic socioeconomic change, I think. For now, the critical unique power of human workers is that you can hold them responsible for things.<p>edit: ability without accountability is the catchier motto :)</p>
]]></description><pubDate>Sat, 16 Aug 2025 02:57:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=44919716</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=44919716</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44919716</guid></item><item><title><![CDATA[New comment by mxwsn in "Unlike ChatGPT, Anthropic has doubled down on Artifacts"]]></title><description><![CDATA[
<p>Has anyone come across any really cool artifacts? I'd be curious to see</p>
]]></description><pubDate>Wed, 16 Jul 2025 02:57:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=44578257</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=44578257</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44578257</guid></item><item><title><![CDATA[New comment by mxwsn in "Web3 Onboarding Was a Flop – and Thank Goodness"]]></title><description><![CDATA[
<p>Stablecoins transferred $27 trillion in 2024 - more than Visa and Mastercard combined. This is right in the article.<p>Stablecoins operate using decentralized ledgers on e.g. Ethereum which use decentralized compute. This isn't mentioned explicitly because the target audience knows this already.</p>
]]></description><pubDate>Mon, 07 Jul 2025 03:56:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=44486617</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=44486617</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44486617</guid></item><item><title><![CDATA[New comment by mxwsn in "Claude 4"]]></title><description><![CDATA[
<p>Gemini has beat it already, but using a different and notably more helpful harness. The creator has said they think harness design is the most important factor right now, and that the results don't mean much for comparing Claude to Gemini.</p>
]]></description><pubDate>Thu, 22 May 2025 16:51:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=44063937</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=44063937</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44063937</guid></item><item><title><![CDATA[New comment by mxwsn in "The booming, high-stakes arms race of airline safety videos"]]></title><description><![CDATA[
<p>Huh, I imagined this was because of relaxing regulation.</p>
]]></description><pubDate>Mon, 07 Apr 2025 00:06:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=43606140</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=43606140</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43606140</guid></item><item><title><![CDATA[New comment by mxwsn in "Deep Learning Is Not So Mysterious or Different"]]></title><description><![CDATA[
<p>Good read, thanks for sharing</p>
]]></description><pubDate>Mon, 17 Mar 2025 18:04:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=43391115</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=43391115</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43391115</guid></item><item><title><![CDATA[New comment by mxwsn in "Some thoughts on autoregressive models"]]></title><description><![CDATA[
<p>> But what is the original purpose of AI research? I will speak for myself here, but I know many other AI researchers will say the same: the ultimate goal is to understand how humans think. And we think the best (or the funniest) way to understand how humans think is to try to recreate it.<p>Eh. To riff on Dijkstra, this is like submarine engineers saying their ultimate goal is to understand how fish swim.</p>
]]></description><pubDate>Fri, 07 Mar 2025 07:25:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=43287945</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=43287945</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43287945</guid></item><item><title><![CDATA[New comment by mxwsn in "AI is stifling new tech adoption?"]]></title><description><![CDATA[
<p>This ought to be called the qwerty effect, for how the qwerty keyboard layout can't be usurped at this point. It was at the right place at the right time, even though arguably its main design choices are no longer relevant, and there are arguably better layouts like dvorak.<p>Python and React may similarly be enshrined for the future, for being at the right place at the right time.<p>English as a language might be another example.</p>
]]></description><pubDate>Fri, 14 Feb 2025 20:01:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=43052445</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=43052445</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43052445</guid></item><item><title><![CDATA[New comment by mxwsn in "Mini-R1: Reproduce DeepSeek R1 "Aha Moment""]]></title><description><![CDATA[
<p>What's surprising about this is how sparsely defined the rewards are. Even if the model learns the formatting reward, if it never chances upon a solution, there isn't any feedback/reward to push it to learn to solve the game more often.<p>So what are the chances of randomly guessing a solution?<p>The toy Countdown dataset here has 3 to 4 numbers, which are combined with 4 symbols (+, -, x, ÷). With 3 numbers there are 3! * 4^3 = 384 possible symbol combinations, with 4 there are 6144. By the tensorboard log [0], even after just 10 learning steps, the model already has a success rate just below 10%. If we make the simplifying assumption that the model hasn't learned anything in 10 steps, then the probability of 1 (or more) success in 80 chances (8 generations are used per step), guessing randomly for a success rate of 1/384 on 3-number problems, is 1.9%. One interpretation is to take this as a p-value, and reject that the model's base success rate is completely random guessing - the base model already has slightly above chance success rate at solving the 3-number CountDown game.<p>This aligns with my intuition - I suspect that with proper prompting, LLMs should be able to solve CountDown decently OK without any training. Though maybe not a 3B model?<p>The model likely "parlays" its successes on 3 numbers to start to learn to solve 4 numbers. Or has it? The final learned ~50% success rate matches the frequency of 4-number problems in Jiayi Pan's CountDown dataset [1]. Phil does provide examples of successful 4-number solutions, but maybe the model hasn't become consistent at 4 numbers yet.<p>[0]: <a href="https://www.philschmid.de/static/blog/mini-deepseek-r1/tensorboard-r1.png" rel="nofollow">https://www.philschmid.de/static/blog/mini-deepseek-r1/tenso...</a>
[1]: <a href="https://huggingface.co/datasets/Jiayi-Pan/Countdown-Tasks-3to4/viewer/default/train?f[nums][min]=3&f[nums][max]=4&f[nums][transform]=length" rel="nofollow">https://huggingface.co/datasets/Jiayi-Pan/Countdown-Tasks-3t...</a></p>
]]></description><pubDate>Fri, 31 Jan 2025 07:21:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=42885367</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=42885367</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42885367</guid></item><item><title><![CDATA[New comment by mxwsn in "I still like Sublime Text"]]></title><description><![CDATA[
<p>I used sublime from 2013 to 2021. It was great. Since, I've switched to VS Code and haven't looked back.</p>
]]></description><pubDate>Wed, 29 Jan 2025 07:14:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=42862418</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=42862418</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42862418</guid></item><item><title><![CDATA[New comment by mxwsn in "AI and the Last Mile 2: Subsidiarity"]]></title><description><![CDATA[
<p>Context is a challenge for LLMs, but the challenge feels of a different quality to me, than the challenge of incorporating local context into automated decision-making AI like algorithmic hiring, banking decisions, and real estate valuation like Zillow. These examples are more like "pre-LLM" machine learning, and it's not clear to me that LLMs are inherently limited in the same way. If anything, I think there's potential for LLMs to more flexibly handle a much broader variety of local contextual information by ingesting natural language rather than non-LLM machine learning systems where how to featurize or represent this information is typically quite bespoke. Take the neighbors' practicing death metal in their garage every Sunday and its impact on house valuation - it feels harder to get a non-LLM ML system to "understand" this, as a very sparse "feature", than an LLM.</p>
]]></description><pubDate>Thu, 28 Nov 2024 23:19:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=42269336</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=42269336</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42269336</guid></item><item><title><![CDATA[New comment by mxwsn in "Francois Chollet is leaving Google"]]></title><description><![CDATA[
<p>My interest was piqued, but the extrapolation in [1] is uh... not the most convincing. If there were more data points then sure, maybe</p>
]]></description><pubDate>Thu, 14 Nov 2024 02:19:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=42132502</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=42132502</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42132502</guid></item><item><title><![CDATA[New comment by mxwsn in "LLMs Will Always Hallucinate, and We Need to Live with This"]]></title><description><![CDATA[
<p>OK - there's always a nonzero chance of hallucination. There's also a non-zero chance that macroscale objects can do quantum tunnelling, but no one is arguing that we "need to live with this" fact. A theoretical proof of the impossibility of reaching 0% probability of some event is nice, but in practice it says little about whether we can exponentially decrease the probability of it happening or not to effectively mitigate risk.</p>
]]></description><pubDate>Sat, 14 Sep 2024 17:14:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=41541148</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=41541148</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41541148</guid></item><item><title><![CDATA[New comment by mxwsn in "Covering All Birthdays"]]></title><description><![CDATA[
<p>I think so. Parents can also make it happen at their convenience by asking doctors. We have technology to induce birth or control its timing over a few days</p>
]]></description><pubDate>Thu, 01 Aug 2024 05:57:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=41126521</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=41126521</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41126521</guid></item><item><title><![CDATA[New comment by mxwsn in "How Does OpenAI Survive?"]]></title><description><![CDATA[
<p>This article is timely and pairs well with Sequoia's $600B question: <a href="https://www.sequoiacap.com/article/ais-600b-question/" rel="nofollow">https://www.sequoiacap.com/article/ais-600b-question/</a> calculated simply from NVidia run rate revenue, which is the cost that genAI companies are paying. Where's the profit?<p>Meta's open source LLM stance makes things more spicy, making it challenging for anyone generate differentiated and lasting profit in the LLM space.<p>At the current pace, the LLM bubble is poised to pop in a year or two - negative net revenue can't keep growing forever - barring a transformative, next-generation capability from closed-source AI companies that Meta can't replicate. All eyes on GPT-5.</p>
]]></description><pubDate>Thu, 01 Aug 2024 04:12:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=41126084</link><dc:creator>mxwsn</dc:creator><comments>https://news.ycombinator.com/item?id=41126084</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41126084</guid></item></channel></rss>