<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: dlewis1788</title><link>https://news.ycombinator.com/user?id=dlewis1788</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 18 Apr 2026 12:57:24 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=dlewis1788" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by dlewis1788 in "GCP Outage"]]></title><description><![CDATA[
<p>Looks like more than KV is having an issue. Just tried to load dash.cloudflare.com and no bueno.</p>
]]></description><pubDate>Thu, 12 Jun 2025 18:21:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=44261017</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=44261017</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44261017</guid></item><item><title><![CDATA[New comment by dlewis1788 in "Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework"]]></title><description><![CDATA[
<p>100% - probably why vLLM is now the default back-end in Dynamo.</p>
]]></description><pubDate>Tue, 01 Apr 2025 16:50:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=43548948</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=43548948</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43548948</guid></item><item><title><![CDATA[New comment by dlewis1788 in "Ask HN: Why hasn’t AMD made a viable CUDA alternative?"]]></title><description><![CDATA[
<p>100% valid - Nvidia is trying to address that now with cuTile and the new Python front-end for CUTLASS.</p>
]]></description><pubDate>Tue, 01 Apr 2025 16:40:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=43548849</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=43548849</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43548849</guid></item><item><title><![CDATA[New comment by dlewis1788 in "Ask HN: Why hasn’t AMD made a viable CUDA alternative?"]]></title><description><![CDATA[
<p>CUDA is an entire ecosystem - not a single programming language extension (C++) or a single library, but a collection of libraries & tools for specific use cases and optimizations (cuDNN, CUTLASS, cuBLAS, NCCL, etc.). There is also tooling support that Nvidia provides, such as profilers, etc. Many of the libraries build on other libraries. Even if AMD had the decent, reliable language extensions for general-purpose GPU programming, they still don't have the libraries and the supporting ecosystem to provide anything to the level that CUDA provides today, which is a decade plus of development effort from Nvidia to build.</p>
]]></description><pubDate>Tue, 01 Apr 2025 14:59:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=43547586</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=43547586</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43547586</guid></item><item><title><![CDATA[New comment by dlewis1788 in "Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework"]]></title><description><![CDATA[
<p>Just curious what your issues with Triton were. We've done OK with it using it to serve LLM models w/ a classifier head via HF Transformers pipeline & Flash Attention 2, as well as serving text generation models with the vLLM back-end.</p>
]]></description><pubDate>Wed, 19 Mar 2025 01:13:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=43407299</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=43407299</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43407299</guid></item><item><title><![CDATA[New comment by dlewis1788 in "TPU transformation: A look back at 10 years of our AI-specialized chips"]]></title><description><![CDATA[
<p>For training, yes, but no indications on inference workloads. Apple has said they would use their own silicon for inference in the cloud.</p>
]]></description><pubDate>Sun, 04 Aug 2024 14:09:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=41153663</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=41153663</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41153663</guid></item><item><title><![CDATA[New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"]]></title><description><![CDATA[
<p>Someone commented below that with enough batchnorm/layernorm/etc. and/or gradient clipping you can manage it, but BF16 just makes life easier if you can live without some precision.</p>
]]></description><pubDate>Mon, 03 Jul 2023 18:18:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=36576872</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=36576872</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36576872</guid></item><item><title><![CDATA[New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"]]></title><description><![CDATA[
<p>I didn't even know about Apple's AMX instructions until I clicked on your link. Very interesting - thanks!</p>
]]></description><pubDate>Mon, 03 Jul 2023 18:14:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=36576804</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=36576804</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36576804</guid></item><item><title><![CDATA[New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"]]></title><description><![CDATA[
<p>My understanding is for certain types of networks BF16 will train better than FP16, given the additional protection against exploding gradients and loss functions with the extended range of BF16 - at the loss of precision.</p>
]]></description><pubDate>Mon, 03 Jul 2023 18:10:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=36576739</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=36576739</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36576739</guid></item><item><title><![CDATA[New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"]]></title><description><![CDATA[
<p>Confirmed Apple M1 lacks bfloat16 support completely - 
M1:
hw.optional.arm.FEAT_BF16: 0
vs
M2:
hw.optional.arm.FEAT_BF16: 1</p>
]]></description><pubDate>Mon, 03 Jul 2023 18:06:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=36576673</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=36576673</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36576673</guid></item><item><title><![CDATA[New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"]]></title><description><![CDATA[
<p>Somehow missed this from WWDC23, but it looks like Sonoma will add support for bfloat16 with Metal, and there's an active PR to add support with the PyTorch MPS back-end (PR #99272). Since M2 added bfloat16 support at the hardware level, I'm assuming this will only be supported on M2 Macs.<p>That maxed out Mac Studio M2 w/ 192GB of memory now looks more appealing...</p>
]]></description><pubDate>Mon, 03 Jul 2023 16:42:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=36575444</link><dc:creator>dlewis1788</dc:creator><comments>https://news.ycombinator.com/item?id=36575444</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36575444</guid></item></channel></rss>