Hacker News: dhruvdh

New comment by dhruvdh in "Pocket TTS: A high quality TTS that gives your CPU a voice"

dhruvdh — Fri, 16 Jan 2026 14:09:20 +0000

Try `uvx pocket-tts serve`

Accelerating LLM Inference with Parallel Draft Models (PARD)

dhruvdh — Fri, 11 Apr 2025 18:10:02 +0000

Article URL: https://www.amd.com/en/developer/resources/technical-articles/accelerating-generative-llms-interface-with-parallel-draft-model-pard.html

Comments URL: https://news.ycombinator.com/item?id=43656675

Points: 1

# Comments: 0

New comment by dhruvdh in "Aiter: AI Tensor Engine for ROCm"

dhruvdh — Mon, 24 Mar 2025 18:08:43 +0000

El Capitan can also do FP8. HPC requires double precision generally but people are trying to make low precision work.

New comment by dhruvdh in "AMD RDNA 4 – AMD Radeon RX 9000 Series Graphics Cards"

dhruvdh — Fri, 28 Feb 2025 13:53:35 +0000

To be fair, you can buy ~3 of these for the price Nvidia charges for 24GB/32GB models.

New comment by dhruvdh in "ROCm Device Support Wishlist"

dhruvdh — Mon, 20 Jan 2025 22:17:48 +0000

To add, AMD only makes _parts_ of an MI300X server.

It's like asking a tire manufacturer to give you a car for free.

New comment by dhruvdh in "My failed attempt at AGI on the Tokio Runtime"

dhruvdh — Thu, 26 Dec 2024 16:52:45 +0000

I wish more people would just try to do things just like this and blog about their failures.

> The published version of a proof is always condensed. And even if you take all the math that has been published in the history of mankind, it’s still small compared to what these models are trained on.

> And people only publish the success stories. The data that are really precious are from when someone tries something, and it doesn’t quite work, but they know how to fix it. But they only publish the successful thing, not the process.

- Terence Tao (https://www.scientificamerican.com/article/ai-will-become-ma...)

Personally, I think failures on their own are valuable. Others can come in and branch off from a decision you made that instead leads to success. Maybe the idea can be applied to a different domain. Maybe your failure clarified something for someone.

New comment by dhruvdh in "MI300X vs. H100 vs. H200 Benchmark Part 1: Training – CUDA Moat Still Alive"

dhruvdh — Mon, 23 Dec 2024 04:26:50 +0000

Disappointed that there wasn’t anything on inference performance in the article at all. That’s what the major customers have announced they use it for.

New comment by dhruvdh in "CUDA Moat Still Alive"

dhruvdh — Mon, 23 Dec 2024 04:21:05 +0000

Which algorithm you pick for what shape of matrices is different and not straightforward to figure out. AMD currently wants you to “tune” ops and likely search for the right algorithm for your shapes while Nvidia has accurate heuristics for picking the right algorithm.

Open-sourcing Three EXAONE 3.5 Models: 2.4B, 7.8B, 32B

dhruvdh — Mon, 09 Dec 2024 15:55:24 +0000

Article URL: https://www.lgresearch.ai/blog/view?seq=507

Comments URL: https://news.ycombinator.com/item?id=42367356

Points: 13

# Comments: 4

New comment by dhruvdh in "AMD outsells Intel in the datacenter space"

dhruvdh — Wed, 06 Nov 2024 00:01:07 +0000

> despite them being fabless

That's not how it works. You need to pump money into fabs to get them working, and Intel doesn't have money. If AMD had fabs to light up their money, they would also have a much lower valuation.

The market is completely irrational on AMD. Their 52-week high is ~225$ and 52-week low is ~90$. 225$ was hit when AMD was guiding ~3.5B in datacenter GPU revenue. Now, they're guiding to end the year at 5B+ datacenter GPU revenue, but the stock is ~140$?

I think it's because of how early Nvidia announced Blackwell (it isn't any meaningful volume yet), and the market thinks AMD needs to compete with GB200 while they're actually competing with H200 this quarter. And for whatever reason the market thinks that AMD will get zero AI growth next year? I don't know how to explain the stock price.

Anyway, they hit record quarterly revenue this Q3 and are guiding to beat this record by ~1B next quarter. Price might move a lot based on how AMD guides for Q1 2025.

New comment by dhruvdh in "AMD outsells Intel in the datacenter space"

dhruvdh — Tue, 05 Nov 2024 21:17:52 +0000

> Performance per watt was better for Intel

No, not its not even close. AMD is miles ahead.

This is a Phoronix review for Turin (current generation): https://www.phoronix.com/review/amd-epyc-9965-9755-benchmark...

You can similarly search for phoronix reviews for the Genoa, Bergamo, and Milan generations (the last two generations).

New comment by dhruvdh in "Nvidia (NVDA) to Replace Intel in the Dow Jones Industrial Average"

dhruvdh — Sat, 02 Nov 2024 00:06:33 +0000

Is AMD behind hyperscaler in-house efforts? Outside of Google I don't think so.

New comment by dhruvdh in "AI PCs Aren't Good at AI: The CPU Beats the NPU"

dhruvdh — Thu, 17 Oct 2024 00:28:35 +0000

Oh, maybe also change the title? I flagged it because of the title/url not matching.

New comment by dhruvdh in "AMD Instinct MI325X to Feature 256GB HBM3E Memory, CDNA4-Based MI355X with 288GB"

dhruvdh — Fri, 11 Oct 2024 13:42:37 +0000

I don't think having a common ancestry for the ISA means much, or even having the same ISA.

Anyway, I don't understand what you want from me or are arguing about. They were trying to win the datacenter CPU market and not the GPU market. They did well at that. They've recently started trying to win the GPU market as well, cause now they can afford to. They seem to be doing well now.

New comment by dhruvdh in "AMD Instinct MI325X to Feature 256GB HBM3E Memory, CDNA4-Based MI355X with 288GB"

dhruvdh — Fri, 11 Oct 2024 13:17:22 +0000

Those are Vega, not CDNA. It wouldn't surprise me if those are rebranded consumer chips, though I haven't checked.

New comment by dhruvdh in "AMD Instinct MI325X to Feature 256GB HBM3E Memory, CDNA4-Based MI355X with 288GB"

dhruvdh — Fri, 11 Oct 2024 13:00:31 +0000

And yet Meta is using MI300X exclusively for all live inference on Llama 405B.

Clearly there are workloads AMD wins at, and just going Nvidia by default for everything without considering AMD is suboptimal.

New comment by dhruvdh in "AMD Instinct MI325X to Feature 256GB HBM3E Memory, CDNA4-Based MI355X with 288GB"

dhruvdh — Fri, 11 Oct 2024 12:56:14 +0000

You know AMD primarily sells CPUs right?

For datacenter GPUs, they're going from ~500M-750M in 2023 full year (can't find proper numbers), to 4.5B+ full year 2024. In GPUs, it's almost like they're entering a new market.

The current Instinct line of products is relatively new too, I found this article [1] on the MI100 launch on Nov, 2020. That's basically start of 2021.

To go from MI100 in 2021, to 4.5B+ of MI300X + MI250X in 2024 is great. They are doing just fine.

On MI355X, I can't find endnotes for the slides they show, but it is not clear if the 9.2PF of FP6 and FP4 is sparse or not (all the other numbers on that slide were non-sparse). If it isn't they're exceeding GB200's sparse FP6/4 numbers with non-sparse flops (!). They both have the same memory bandwidth though. AMD is doing just fine.

[1] https://www.servethehome.com/amd-radeon-instinct-mi100-32gb-...

New comment by dhruvdh in "AMD GPU Inference"

dhruvdh — Wed, 02 Oct 2024 20:56:40 +0000

Batching is how you get ~350 tokens/sec on Qwen 14b on vLLM (7900XTX). By running 15 requests at once.

Also, there is a Dockerfile.rocm at the root of vLLM's repo. How is it a pain?

New comment by dhruvdh in "AMD GPU Inference"

dhruvdh — Wed, 02 Oct 2024 18:02:27 +0000

Why would you use this over vLLM?

New comment by dhruvdh in "Tinygrad 0.9.0"

dhruvdh — Tue, 28 May 2024 19:58:06 +0000

What's the point of the 8000 LOC limit? Has anyone worked in a project with a LOC limit? Why was the limit in place?