<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: bytepoet</title><link>https://news.ycombinator.com/user?id=bytepoet</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 24 Apr 2026 11:59:25 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=bytepoet" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by bytepoet in "GPUs: Anatomy of high performance matmul kernels"]]></title><description><![CDATA[
<p>Wonderful! Great, detailed explanation. I look forward to reading the vLLM post as well.</p>
]]></description><pubDate>Tue, 30 Sep 2025 14:32:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=45425958</link><dc:creator>bytepoet</dc:creator><comments>https://news.ycombinator.com/item?id=45425958</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45425958</guid></item><item><title><![CDATA[New comment by bytepoet in "Compiling LLMs into a MegaKernel: A path to low-latency inference"]]></title><description><![CDATA[
<p>Thanks for the inputs. It's very helpful to know.<p>I look forward to following mirage development.</p>
]]></description><pubDate>Fri, 20 Jun 2025 00:41:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=44323821</link><dc:creator>bytepoet</dc:creator><comments>https://news.ycombinator.com/item?id=44323821</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44323821</guid></item><item><title><![CDATA[New comment by bytepoet in "Compiling LLMs into a MegaKernel: A path to low-latency inference"]]></title><description><![CDATA[
<p>This is very cool. I enjoyed going through the writeup and GitHub README.<p>I was wondering if these same optimizations can be brought to bear on training as well, rather than only inference. I guess the challenge here is fusing backward computations with gradient communication.<p>I also saw that this currently does not handle dynamic workloads such as MoE. I recently came across this paper that does exactly this:<p>FlashDMoE: Fast Distributed MoE in a Single Kernel - <a href="https://arxiv.org/pdf/2506.04667" rel="nofollow">https://arxiv.org/pdf/2506.04667</a></p>
]]></description><pubDate>Thu, 19 Jun 2025 20:36:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=44322313</link><dc:creator>bytepoet</dc:creator><comments>https://news.ycombinator.com/item?id=44322313</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44322313</guid></item><item><title><![CDATA[New comment by bytepoet in "Writing in the Age of LLMs"]]></title><description><![CDATA[
<p>I really enjoyed reading this, particularly the first part where the author was specific about why we invariably (and often vaguely) find LLM generated text slightly off.<p>I cherish writing and find it a wonderful tool for thinking. So far, I've tried to do technical writing without much LLM help. I do run the final writing through a good model to point out factual inaccuracies.</p>
]]></description><pubDate>Tue, 17 Jun 2025 19:13:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=44302764</link><dc:creator>bytepoet</dc:creator><comments>https://news.ycombinator.com/item?id=44302764</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44302764</guid></item><item><title><![CDATA[New comment by bytepoet in "How much do language models memorize?"]]></title><description><![CDATA[
<p>I enjoyed reading this paper. The experiments are well-designed and it's well-written.<p>Much work on generalization of ML models deals with asymptotic bounds. Here, there's a precise way of measuring these, even for relatively large models with millions of parameters.</p>
]]></description><pubDate>Fri, 06 Jun 2025 08:03:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=44198741</link><dc:creator>bytepoet</dc:creator><comments>https://news.ycombinator.com/item?id=44198741</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44198741</guid></item><item><title><![CDATA[New comment by bytepoet in "The Who Cares Era"]]></title><description><![CDATA[
<p>Such a well-written and thoughtful blog post. Loved it!</p>
]]></description><pubDate>Wed, 28 May 2025 13:55:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=44116075</link><dc:creator>bytepoet</dc:creator><comments>https://news.ycombinator.com/item?id=44116075</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44116075</guid></item><item><title><![CDATA[New comment by bytepoet in "LLMs get lost in multi-turn conversation"]]></title><description><![CDATA[
<p>The inability of LLMs of ask for clarification was exactly the flaw we encountered when testing them on open-ended problems, stated somewhat ambiguously. This was in the context of paradoxical situations, tested on DeepSeek-R1 and Claude-3.7-Sonnet. Blog post about our experiments: <a href="https://pankajpansari.github.io/posts/paradoxes/" rel="nofollow">https://pankajpansari.github.io/posts/paradoxes/</a></p>
]]></description><pubDate>Thu, 15 May 2025 05:52:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=43992217</link><dc:creator>bytepoet</dc:creator><comments>https://news.ycombinator.com/item?id=43992217</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43992217</guid></item><item><title><![CDATA[New comment by bytepoet in "GPU Puzzles"]]></title><description><![CDATA[
<p>Thanks a lot, Sasha, for creating these. I found your LLM training puzzles to be excellent as well.</p>
]]></description><pubDate>Mon, 23 Sep 2024 13:27:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=41625866</link><dc:creator>bytepoet</dc:creator><comments>https://news.ycombinator.com/item?id=41625866</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41625866</guid></item></channel></rss>