Hacker News: dlewis1788

New comment by dlewis1788 in "GCP Outage"

dlewis1788 — Thu, 12 Jun 2025 18:21:58 +0000

Looks like more than KV is having an issue. Just tried to load dash.cloudflare.com and no bueno.

New comment by dlewis1788 in "Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework"

dlewis1788 — Tue, 01 Apr 2025 16:50:09 +0000

100% - probably why vLLM is now the default back-end in Dynamo.

New comment by dlewis1788 in "Ask HN: Why hasn’t AMD made a viable CUDA alternative?"

dlewis1788 — Tue, 01 Apr 2025 16:40:51 +0000

100% valid - Nvidia is trying to address that now with cuTile and the new Python front-end for CUTLASS.

New comment by dlewis1788 in "Ask HN: Why hasn’t AMD made a viable CUDA alternative?"

dlewis1788 — Tue, 01 Apr 2025 14:59:19 +0000

CUDA is an entire ecosystem - not a single programming language extension (C++) or a single library, but a collection of libraries & tools for specific use cases and optimizations (cuDNN, CUTLASS, cuBLAS, NCCL, etc.). There is also tooling support that Nvidia provides, such as profilers, etc. Many of the libraries build on other libraries. Even if AMD had the decent, reliable language extensions for general-purpose GPU programming, they still don't have the libraries and the supporting ecosystem to provide anything to the level that CUDA provides today, which is a decade plus of development effort from Nvidia to build.

New comment by dlewis1788 in "Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework"

dlewis1788 — Wed, 19 Mar 2025 01:13:21 +0000

Just curious what your issues with Triton were. We've done OK with it using it to serve LLM models w/ a classifier head via HF Transformers pipeline & Flash Attention 2, as well as serving text generation models with the vLLM back-end.

New comment by dlewis1788 in "TPU transformation: A look back at 10 years of our AI-specialized chips"

dlewis1788 — Sun, 04 Aug 2024 14:09:18 +0000

For training, yes, but no indications on inference workloads. Apple has said they would use their own silicon for inference in the cloud.

New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"

dlewis1788 — Mon, 03 Jul 2023 18:18:45 +0000

Someone commented below that with enough batchnorm/layernorm/etc. and/or gradient clipping you can manage it, but BF16 just makes life easier if you can live without some precision.

New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"

dlewis1788 — Mon, 03 Jul 2023 18:14:08 +0000

I didn't even know about Apple's AMX instructions until I clicked on your link. Very interesting - thanks!

New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"

dlewis1788 — Mon, 03 Jul 2023 18:10:09 +0000

My understanding is for certain types of networks BF16 will train better than FP16, given the additional protection against exploding gradients and loss functions with the extended range of BF16 - at the loss of precision.

New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"

dlewis1788 — Mon, 03 Jul 2023 18:06:27 +0000

Confirmed Apple M1 lacks bfloat16 support completely - M1: hw.optional.arm.FEAT_BF16: 0 vs M2: hw.optional.arm.FEAT_BF16: 1

New comment by dlewis1788 in "Bfloat16 support coming to Apple's Metal and PyTorch [video]"

dlewis1788 — Mon, 03 Jul 2023 16:42:41 +0000

Somehow missed this from WWDC23, but it looks like Sonoma will add support for bfloat16 with Metal, and there's an active PR to add support with the PyTorch MPS back-end (PR #99272). Since M2 added bfloat16 support at the hardware level, I'm assuming this will only be supported on M2 Macs.

That maxed out Mac Studio M2 w/ 192GB of memory now looks more appealing...