Hacker News: ademeure

New comment by ademeure in "RTX 5090 and M4 MacBook Air: Can It Game?"

ademeure — Thu, 14 May 2026 17:44:41 +0000

Apple GPUs didn’t have tensor cores until the M5 (aka “a neural accelerator in each core”) and in the article’s charts that a M5 Pro significantly beats a M4 Max (while in other workloads it would be much smaller since Pro is ~1/2 Max).

EDIT: since Aurornis beat me by 3 minutes, I’ll add another interesting tidbit instead :)

NVIDIA tensor cores on consumer GPUs are massively less powerful per SM core than on their datacenter counterparts-parts (which also makes them easier to get to peak efficiency on consumer GPUs because the rest of the pipeline is much more quickly a bottleneck as per Amdahl’s Law).

This is potentially changing with Vera Rubin CPX which looks an awful lot like a RTX 5090 replacement but with the full-blown datacenter tensor cores (that won’t be available unless you pay for the datacenter SKU) - so it will have very high TFLOPS relative to its bandwidth.

The target market for the CPX is exactly this: prefill and Time To First Token. You can basically just throw compute at the problem for (parts of) prefill performance (but it won’t help anything else past a certain point) and the 5090/M5 are nowhere near that limit.

So the design choice for NVIDIA/Apple/etc of how much silicon to spend for this on consumer GPUs is mostly dictated by economics and how much they can reuse the same chips for the different markets.

New comment by ademeure in "I'm glad the Anthropic fight is happening now"

ademeure — Wed, 11 Mar 2026 21:25:23 +0000

There's definitely something to be said for giving interesting people a platform to express their views unconditionally. Unfortunately, that can also be a very dangerous thing. I have been less and less impressed over the years with Lex's approach here.

I'm personally very glad that Dwarkesh isn't like that. He's not perfect, but I think he's doing a way better job than other podcasters in the field right now.

New comment by ademeure in "AutoKernel: Autoresearch for GPU Kernels"

ademeure — Wed, 11 Mar 2026 10:44:57 +0000

This is very cool!

I've been working on something somewhat similar over the last few weeks, but trying to be much more general and arguably over-engineered! I like the scope of this project, keeping it limited to Triton and specific kinds of kernels makes it quite simple and efficient.

I'm confused by the progress graph though; it looks like it's benchmarking a 4096x4096x4096 fp16 matmul rather than a full repo, and it claims a 1.31x improvement vs cuBLAS... while running at 187 TFLOPS which is 18.9% of peak utilization? cuBLAS definitely gets much closer to peak than that - most likely it's limited by CPU overhead or something else? Benchmarking is hard!

Either way I'm excited to see other people working on this, I think it's an extremely promising area over the next 6 months.

New comment by ademeure in "ChatGPT with voice is now available to all free users"

ademeure — Wed, 22 Nov 2023 00:01:36 +0000

Do you mean Siri's voice recognition? If so, 100% agreed. My iOS shortcut uses OpenAI's Whisper API for voice recognition, and Siri (English United Kingdom - Siri Voice 1) for text to speech.

I really like dictating things sometimes, and Whisper is perfect for that (automatic paragraphs inside the model itself would be nice but not a big deal).

If anyone is interested - the "Whisper speech recognition in iOS" part is based on this shortcut I found that you can easily use yourself on both iOS and MacOS (free except for the OpenAI API usage fees obviously): https://giacomomelzi.com/transcribe-audio-messages-iphone-ai...

New comment by ademeure in "ChatGPT with voice is now available to all free users"

ademeure — Tue, 21 Nov 2023 22:47:28 +0000

Am I the only one who doesn't like these specific voices? The quality is incredible, but they feel too cheery/enthusiastic/casual and it just gets annoying after a while.

I made an iOS shortcut a while ago that uses Siri with the ChatGPT app (it has iOS shortcut bindings) and despite Siri being a useless pile of junk compared to this, I actually prefer Siri's voice to this in some ways, because it doesn't feel so over the top.

Maybe this is partly because of different cultural expectations between the USA and Europe? Or maybe I'm just being too cynical and ChatGPT really is that happy talking with me!...

New comment by ademeure in "OpenAI's board has fired Sam Altman"

ademeure — Fri, 17 Nov 2023 22:55:41 +0000

You're right, I was imagining that he decided to hide the (full extent of?) the breakthrough to the board and do things covertly for some reason which could warrant firing him, but that's a pretty unlikely prior: why would he hide it from the board in the first place, given AGI is literally the board's mission? One reason might be that he wants to slow down this AGI progress until they've made more progress on safety and decided to hide it for that reason, and the board disagrees, but that sounds too much like a movie script to be real and very unlikely!

As I said, while I do have a mostly positive opinion of Sam Altman (I disagree with him on certain things but I and trust him a lot more than the vast majority of tech CEOs and politicians and I'd rather he be in the room when true superhuman intelligence is created than them), I hope this has nothing to do with AGI and it's "just" a personal scandal.

New comment by ademeure in "OpenAI's board has fired Sam Altman"

ademeure — Fri, 17 Nov 2023 22:40:55 +0000

Fair enough, but having worked for an extremely secretive FAANG myself, "we need XYZ" is the kind of thing I'd expect to hear if you have XYZ internally but don't want to reveal it yet. It could basically mean "we need XYZ relative to the previous product" or more specifically "we need another breakthrough than LLMs, and we recently made a major breakthrough unrelated to LLMs". I'm not saying that's the case but I don't think the signal-to-noise ratio in his answer is very high.

More importantly, OpenAI's claim (whether you believe it or not) has always been that their structure is optimised towards building AGI, and that everything else including the for-profit part is just a means to that end: https://openai.com/our-structure and https://openai.com/blog/openai-lp

Either the board doesn't actually share that goal, or what you are saying shouldn't matter to them. Sam isn't an engineer, it's not his job to make the breakthrough, only to keep the lights on until they do if you take their mission literally.

Unless you're arguing that Sam claimed they were closer to AGI to the board than they really are (rather than hiding anything from them) in order to use the not-for-profit part of the structure in a way the board disagreed with, or some other financial shenanigans?

As I said, I hope you're right, because the alternative is a lot scarier.

New comment by ademeure in "OpenAI's board has fired Sam Altman"

ademeure — Fri, 17 Nov 2023 22:13:10 +0000

Sam implied OpenAI had a major breakthrough a few weeks ago in a panel yesterday:

"Like 4 times now in the history of OpenAI, the most recent time was just in the last couple of weeks, I've gotten to be in the room when we sort of like, pushed the veil of ignorance back and the frontier of discovery forward. And getting to do that is like the professional honor of a lifetime".

https://www.youtube.com/watch?v=ZFFvqRemDv8#t=13m22s

This is going to sound terrible, but I really hope this is a financial or ethical scandal about Sam Altman personally and he did something terribly wrong, because the alternative is that this is about how close we are to true AGI.

Superhuman intelligence could be a wonderful thing if done right, but the world is not ready for a fast take-off, and the governance structure of OpenAI certainly wouldn't be ready for it either it seems.

New comment by ademeure in "Nomnoml"

ademeure — Mon, 02 Oct 2023 14:17:08 +0000

I made a few tools using nomnoml in the past, including control flow & dependency graphs for GPU assembly code. I really like it, the only negative was that there's no reliable way to push certain things to be positioned close together, so for extremely large diagrams it sometimes makes bad choices and it gets a bit messy.

The code is reasonably easy to modify as well even if it isn't fully documented, so I was able to hack in a way to have tooltips on hover, and making certain boxes clickable linking to other diagrams. I'm really grateful for this awesome tool being open source!

New comment by ademeure in "Arm’s Cortex A510: Two Kids in a Trench Coat"

ademeure — Mon, 02 Oct 2023 14:02:37 +0000

As a former GPU architect, that's really interesting, thanks! I didn't realise A53's caches were strictly in-order and couldn't service hits ahead of misses, I always assumed this was something even much simpler designs were capable of.

I think complexity of verification as an argument against out-of-order is questionable, because if out-of-order resulted in a better core and a competitor did manage to build and properly verify such a core, then they would have a strong competitive advantage. But that might not be true in practice given the area/power cost.

As an aside: different GPU vendors also have different limitations when it comes to in-order vs out-of-order caches, and GPUs have the extra complexity that loads are effectively doing "gather", e.g. 32-wide warps doing a load with 32 addresses that may or may not uniquify, so a single "return" to the shader processor may be anything from 1 to 32 (or even 64) cachelines.

And GPU gets even more tricky with the texture unit doing trilinear+anisotropic filtering, so a single pixel may require 32x as many inputs, and you may even get into situations where the cache isn't big enough (or doesn't have enough ways) to handle the worst case and you have to revert to in-order for certain modes, or process things at a finer granularity than entire warps! Or just do in-order for everything with huge latency FIFOs and accept the latency cost. Lots of different ways to handle this, also depending on what granularity of returns your shader processor can handle. As you said, both modern CPUs and GPUs can't really be defined using simple labels.

Gather makes things a lot harder for load pipelines so I'm not surprised Zen4 seems to still just split it into uOps, but I'm curious exactly how Intel solves handles it in their CPU microarchitecture. Sadly this is the kind of thing that's practically impossible to know as an outsider!