<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: lewtun</title><link>https://news.ycombinator.com/user?id=lewtun</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 10 May 2026 08:45:49 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=lewtun" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by lewtun in "AlphaEvolve: Gemini-powered coding agent scaling impact across fields"]]></title><description><![CDATA[
<p>Shameless plug: <a href="https://huggingface.co/spaces/smolagents/ml-intern" rel="nofollow">https://huggingface.co/spaces/smolagents/ml-intern</a><p>It’s a simple harness around Opus, but with tight integration to Hugging Face infra, so the agent can read papers, test code and launch experiments</p>
]]></description><pubDate>Thu, 07 May 2026 16:14:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48051191</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=48051191</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48051191</guid></item><item><title><![CDATA[New comment by lewtun in "I Just Want Simple S3"]]></title><description><![CDATA[
<p>Hugging Face Buckets are pretty simple: <a href="https://huggingface.co/docs/huggingface_hub/en/guides/buckets" rel="nofollow">https://huggingface.co/docs/huggingface_hub/en/guides/bucket...</a><p>Disclaimer: I work at HF</p>
]]></description><pubDate>Mon, 13 Apr 2026 21:57:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47758377</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=47758377</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47758377</guid></item><item><title><![CDATA[New comment by lewtun in "When models manipulate manifolds: The geometry of a counting task"]]></title><description><![CDATA[
<p>The analogy stems from the notion that neural nets are "grown" rather than "engineered". Chris Olah has an old, but good post with some specific examples: <a href="https://colah.github.io/notes/bio-analogies/" rel="nofollow">https://colah.github.io/notes/bio-analogies/</a></p>
]]></description><pubDate>Mon, 03 Nov 2025 11:07:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=45797861</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=45797861</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45797861</guid></item><item><title><![CDATA[New comment by lewtun in "The Smol Training Playbook: The Secrets to Building World-Class LLMs"]]></title><description><![CDATA[
<p>Thanks! I expect the book will remain relevant as long as the Transformers architecture does. That’s why we mostly focus on topics we think will stand the test of time, but let’s see how that plays out :)</p>
]]></description><pubDate>Sun, 02 Nov 2025 13:42:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=45790288</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=45790288</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45790288</guid></item><item><title><![CDATA[New comment by lewtun in "The Smol Training Playbook: The Secrets to Building World-Class LLMs"]]></title><description><![CDATA[
<p>In the specific case of SmolLM, it originates from the meme in this dataset <a href="https://huggingface.co/datasets/bigcode/the-stack-smol" rel="nofollow">https://huggingface.co/datasets/bigcode/the-stack-smol</a></p>
]]></description><pubDate>Sun, 02 Nov 2025 06:14:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=45788176</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=45788176</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45788176</guid></item><item><title><![CDATA[New comment by lewtun in "The Smol Training Playbook: The Secrets to Building World-Class LLMs"]]></title><description><![CDATA[
<p>Hi, Lewis here (one of the co-authors). Happy to answer any questions people have about the book :)</p>
]]></description><pubDate>Sat, 01 Nov 2025 21:50:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=45785734</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=45785734</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45785734</guid></item><item><title><![CDATA[PyTorch OpenEnv]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/meta-pytorch/OpenEnv">https://github.com/meta-pytorch/OpenEnv</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45683252">https://news.ycombinator.com/item?id=45683252</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 23 Oct 2025 15:49:13 +0000</pubDate><link>https://github.com/meta-pytorch/OpenEnv</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=45683252</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45683252</guid></item><item><title><![CDATA[Scaling Laws for Reinforcement Learning]]></title><description><![CDATA[
<p>Article URL: <a href="https://huggingface.co/papers/2510.13786">https://huggingface.co/papers/2510.13786</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45607058">https://news.ycombinator.com/item?id=45607058</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 16 Oct 2025 16:01:16 +0000</pubDate><link>https://huggingface.co/papers/2510.13786</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=45607058</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45607058</guid></item><item><title><![CDATA[New comment by lewtun in "LoRA Without Regret"]]></title><description><![CDATA[
<p>For those interested in playing with an implementation of these ideas, my colleagues at HF made some recipes here: <a href="https://github.com/huggingface/trl/blob/main/docs/source/lora_without_regret.md" rel="nofollow">https://github.com/huggingface/trl/blob/main/docs/source/lor...</a></p>
]]></description><pubDate>Sat, 04 Oct 2025 21:00:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=45476663</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=45476663</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45476663</guid></item><item><title><![CDATA[New comment by lewtun in "Quantum Mechanics, Concise Book"]]></title><description><![CDATA[
<p>“QED and the Men Who Made It” [1] might be close to what you’re after for quantum theory at least. Unlike other popular accounts, it gets quite technical and covers a lot of the historical dead ends that people had during the development of quantum field theory.<p>[1] <a href="https://press.princeton.edu/books/paperback/9780691033273/qed-and-the-men-who-made-it" rel="nofollow">https://press.princeton.edu/books/paperback/9780691033273/qe...</a></p>
]]></description><pubDate>Sat, 06 Sep 2025 09:47:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=45147923</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=45147923</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45147923</guid></item><item><title><![CDATA[New comment by lewtun in "Adaptive LLM routing under budget constraints"]]></title><description><![CDATA[
<p>> We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB<p>Academics are pretty creative at naming their creations</p>
]]></description><pubDate>Mon, 01 Sep 2025 19:59:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=45096080</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=45096080</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45096080</guid></item><item><title><![CDATA[New comment by lewtun in "Smollm3: Smol, multilingual, long-context reasoner LLM"]]></title><description><![CDATA[
<p>Indeed we opted for offline methods like Anchored Preference Optimization as we found in the Open R1 project that doing multi-task RL on small models is quite a hassle to get right. With offline methods, you focus much more on dataset curation / generation, but that still provides faster iteration cycles for the model scale we’re dealing with!</p>
]]></description><pubDate>Tue, 08 Jul 2025 18:38:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=44502761</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=44502761</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44502761</guid></item><item><title><![CDATA[New comment by lewtun in "PDF to Text, a challenging problem"]]></title><description><![CDATA[
<p>> The absolute best way of doing this is these days is likely through a vision based machine learning model, but that is an approach that is very far away from scaling to processing hundreds of gigabytes of PDF files off a single server with no GPU.<p>SmolDocling is pretty fast and the ONNX weights can be scaled to many CPUs: <a href="https://huggingface.co/ds4sd/SmolDocling-256M-preview" rel="nofollow">https://huggingface.co/ds4sd/SmolDocling-256M-preview</a><p>Not sure what time scale the author had in mind for processing GBs of PDFs, but the future might be closer than “very far away”</p>
]]></description><pubDate>Wed, 14 May 2025 08:39:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=43982288</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=43982288</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43982288</guid></item><item><title><![CDATA[DESI results show dark energy may be evolving over time]]></title><description><![CDATA[
<p>Article URL: <a href="https://newscenter.lbl.gov/2025/03/19/new-desi-results-strengthen-hints-that-dark-energy-may-evolve/">https://newscenter.lbl.gov/2025/03/19/new-desi-results-strengthen-hints-that-dark-energy-may-evolve/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43426470">https://news.ycombinator.com/item?id=43426470</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 20 Mar 2025 17:48:46 +0000</pubDate><link>https://newscenter.lbl.gov/2025/03/19/new-desi-results-strengthen-hints-that-dark-energy-may-evolve/</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=43426470</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43426470</guid></item><item><title><![CDATA[DocumentAI with 256M Parameters]]></title><description><![CDATA[
<p>Article URL: <a href="https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo">https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43425862">https://news.ycombinator.com/item?id=43425862</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 20 Mar 2025 17:07:50 +0000</pubDate><link>https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=43425862</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43425862</guid></item><item><title><![CDATA[220k reasoning traces from DeepSeek-R1]]></title><description><![CDATA[
<p>Article URL: <a href="https://huggingface.co/blog/open-r1/update-2">https://huggingface.co/blog/open-r1/update-2</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43009549">https://news.ycombinator.com/item?id=43009549</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 11 Feb 2025 06:12:16 +0000</pubDate><link>https://huggingface.co/blog/open-r1/update-2</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=43009549</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43009549</guid></item><item><title><![CDATA[New comment by lewtun in "Show HN: A real time AI video agent with under 1 second of latency"]]></title><description><![CDATA[
<p>I gave the demo a spin and it’s pretty nice! One thing I noticed is that the avatar doesn’t seem to be aware of it’s surroundings- for example, I asked it why it was wearing a cowboy hat and it was adamant that it wasn’t wearing a hat at all :)</p>
]]></description><pubDate>Wed, 02 Oct 2024 06:05:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=41717642</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=41717642</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41717642</guid></item><item><title><![CDATA[New comment by lewtun in "RLHF is just barely RL"]]></title><description><![CDATA[
<p>> I expect language models to also get crazy good at mathematical theorem proving<p>Indeed, systems like AlphaProof / AlphaGeometry are already able to win a silver medal at the IMO, and the former relies on Lean for theorem verification [1]. On the open source side, I really like the ideas in LeanDojo [2], which use a form of RAG to assist the LLM with premise selection.<p>[1] <a href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/" rel="nofollow">https://deepmind.google/discover/blog/ai-solves-imo-problems...</a><p>[2] <a href="https://leandojo.org/" rel="nofollow">https://leandojo.org/</a></p>
]]></description><pubDate>Thu, 08 Aug 2024 12:51:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=41190984</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=41190984</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41190984</guid></item><item><title><![CDATA[The largest math dataset of Olympiad problems for training LLMs]]></title><description><![CDATA[
<p>Article URL: <a href="https://huggingface.co/datasets/AI-MO/NuminaMath-CoT">https://huggingface.co/datasets/AI-MO/NuminaMath-CoT</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41027167">https://news.ycombinator.com/item?id=41027167</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Sun, 21 Jul 2024 18:43:20 +0000</pubDate><link>https://huggingface.co/datasets/AI-MO/NuminaMath-CoT</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=41027167</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41027167</guid></item><item><title><![CDATA[New comment by lewtun in "[dead]"]]></title><description><![CDATA[
<p>Hello everyone, we just did a speed run with Argilla and KAIST AI to fine-tune the beefy new Mixtral model with some new techniques that came out recently. More details in the model card - enjoy!</p>
]]></description><pubDate>Thu, 11 Apr 2024 23:07:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=40007770</link><dc:creator>lewtun</dc:creator><comments>https://news.ycombinator.com/item?id=40007770</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40007770</guid></item></channel></rss>