<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: zander_jiang</title><link>https://news.ycombinator.com/user?id=zander_jiang</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 04 Jul 2026 13:45:33 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=zander_jiang" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by zander_jiang in "Modern GPU Programming for MLSys"]]></title><description><![CDATA[
<p>The book is really well written.</p>
]]></description><pubDate>Sun, 28 Jun 2026 23:34:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=48712928</link><dc:creator>zander_jiang</dc:creator><comments>https://news.ycombinator.com/item?id=48712928</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48712928</guid></item><item><title><![CDATA[New comment by zander_jiang in "SpaceX to buy Cursor for $60B"]]></title><description><![CDATA[
<p>I wonder what happens to fireworks ai, who provided the infra to train and serve composer 2, cursor was their largest customer, and they're probably loosing it.</p>
]]></description><pubDate>Tue, 16 Jun 2026 19:39:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=48560793</link><dc:creator>zander_jiang</dc:creator><comments>https://news.ycombinator.com/item?id=48560793</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48560793</guid></item><item><title><![CDATA[New comment by zander_jiang in "MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second"]]></title><description><![CDATA[
<p>tilert is a highly optimized megakernel, its a single kernel that does the entire decode pass, this enables overlapping weight loading with computation, eliminates cuda launch overhead (CUDA graph does not, contrary to what most people think), allows for more fine-grained pipelining. There're lots of blogs/papers on it. Its currently the best approach to maximize memory bandwidth. But megakernels are incredibly hard to optimize, and only work for small batch sizes (low throughput, hence high price), thats why we don't see them much in production.</p>
]]></description><pubDate>Tue, 09 Jun 2026 06:54:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48457506</link><dc:creator>zander_jiang</dc:creator><comments>https://news.ycombinator.com/item?id=48457506</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48457506</guid></item><item><title><![CDATA[New comment by zander_jiang in "MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second"]]></title><description><![CDATA[
<p>tilert is closed source, the repo is just a python wrapper that invokes the binary.</p>
]]></description><pubDate>Tue, 09 Jun 2026 06:45:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=48457438</link><dc:creator>zander_jiang</dc:creator><comments>https://news.ycombinator.com/item?id=48457438</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48457438</guid></item></channel></rss>