Hacker News: zander_jiang

New comment by zander_jiang in "Modern GPU Programming for MLSys"

zander_jiang — Sun, 28 Jun 2026 23:34:11 +0000

The book is really well written.

New comment by zander_jiang in "SpaceX to buy Cursor for $60B"

zander_jiang — Tue, 16 Jun 2026 19:39:26 +0000

I wonder what happens to fireworks ai, who provided the infra to train and serve composer 2, cursor was their largest customer, and they're probably loosing it.

New comment by zander_jiang in "MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second"

zander_jiang — Tue, 09 Jun 2026 06:54:39 +0000

tilert is a highly optimized megakernel, its a single kernel that does the entire decode pass, this enables overlapping weight loading with computation, eliminates cuda launch overhead (CUDA graph does not, contrary to what most people think), allows for more fine-grained pipelining. There're lots of blogs/papers on it. Its currently the best approach to maximize memory bandwidth. But megakernels are incredibly hard to optimize, and only work for small batch sizes (low throughput, hence high price), thats why we don't see them much in production.

New comment by zander_jiang in "MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second"

zander_jiang — Tue, 09 Jun 2026 06:45:14 +0000

tilert is closed source, the repo is just a python wrapper that invokes the binary.