Hacker News: gchadwick

New comment by gchadwick in "Project Glasswing: Securing critical software for the AI era"

gchadwick — Wed, 08 Apr 2026 13:57:15 +0000

Remember OpenAI decided GPT 2 was far too dangerous to unleash upon the world when they first trained it!

New comment by gchadwick in "Show HN: A game where you build a GPU"

gchadwick — Sat, 04 Apr 2026 19:17:12 +0000

A nice game, though the truth table lighting round is pretty punishing! Big contrast to the circuit building part where you can take your time. Personally I'd drop the time requirements from that quiz section.

Nvidia Groq 3 LPX

gchadwick — Mon, 16 Mar 2026 20:09:47 +0000

Article URL: https://www.nvidia.com/en-gb/data-center/lpx/

Comments URL: https://news.ycombinator.com/item?id=47404205

Points: 4

# Comments: 0

New comment by gchadwick in "OpenTitan Shipping in Production"

gchadwick — Fri, 06 Mar 2026 09:08:54 +0000

> Also we found the formal waiver analysis tools to be very effective for waiving unreachable code, in case you aren't using those.

Yes we had used them just never got it slickly integrated into the verification dashboard. We had used this kind of analysis for internal sign off. You could generate the waivers manually and check them in but that suffers from the problem discussed above. Plus as OpenTitan was a cross company project you run into EDA licensing issues where not everyone has access to the same set of tools and a UNR flow could be running fine on one partner's infrastructure but isn't workable everywhere for multitude of reasons.

The ideal would be the nightly regression would do the UNR flow to generate the waivers and apply them when generating coverage but as ever there's only so much engineering time to go around and always other priorities.

New comment by gchadwick in "OpenTitan Shipping in Production"

gchadwick — Fri, 06 Mar 2026 06:04:44 +0000

You can see the latest nightly results here: https://opentitan.org/dashboard/index.html note there are some 100% figures.

Having spent several years working on OT I can tell you that most of the gaps are things that should be waived anyway. Getting waiver files reliably integrated into that flow has been problematic as those files are fragile, alter the RTL and they typically break as they refer to things by line number or expect a particular expression to be identical to when you did a waiver for it.

This has all been examined and the holes have been deemed unconcerning, yes ideally there'd be full waivers documenting this but as with any real life engineering project you can't do everything perfectly! There is internal documentation explaining the rationale for why the holes aren't a problem but it's not public.

New comment by gchadwick in "OpenTitan Shipping in Production"

gchadwick — Thu, 05 Mar 2026 22:32:36 +0000

I worked on OpenTitan for around 5 years at lowRISC. It certainly has its ups and downs but it's generated some great stuff and I'm very glad to see hit proper volume production like this. Whilst there's definitely open source chips out there and lots more using bits of open source that don't actually advertise this fact I believe this is the first chip with completely open RTL that's in a major production volume use case.

One of highlights working on OpenTitan was the amount of interest we got from the academic community. Work they did could actually get factored into the first generation silicon making it stronger. Ordinarily chips like that are kept deeply under wraps and the first time the wider security community can take a look at them development has long completed so anything they might find could only effect generation 2 or 3 of the device.

Academic collaboration also helped get ahead in post quantum crypto. This first generation chip has limited capabilities there but thanks to multiple academics using the design as a base for their own PQC work there was lots to draw on for future designs.

I'm no longer at lowRISC so I don't know where OpenTitan is going next but I look forward to finding out.

New comment by gchadwick in "The path to ubiquitous AI (17k tokens/sec)"

gchadwick — Fri, 20 Feb 2026 12:46:16 +0000

This is an interesting piece of hardware though when they go multi-chip for larger models the speed will no doubt suffer.

They'll also be severely limited on context length as it needs to sit in SRAM. Looks like the current one tops out at 6144 tokens which I presume is a whole chips worth. You'd also have to dedicate a chip to a whole user as there's likely only enough SRAM for one user's worth of context. I wonder how much time it takes them to swap users in/out? I wouldn't be surprised if this chip is severely underutilized (can't use it all when running decode as you have to run token by token with one users and then idle time as you swap users in/out).

Maybe a more realistic deployment would have chips for linear layers and chips for attention? You could batch users through the shared weight chips and then provision more or less attention chips as you want which would be per user (or shared amongst a small group 2-4 users).

New comment by gchadwick in "Two different tricks for fast LLM inference"

gchadwick — Sun, 15 Feb 2026 11:17:50 +0000

> If copying user tokens was the bottle neck, batching would not achieve any speed up.

Reality is more complex. As context length grows your KV cache becomes large and will begin to dominate your total FLOPs (and hence bytes loaded). The issue with KV cache is you cannot batch it because only one user can use it, unlike static layer weights where you can reuse them across multiple users.

Emerging sparse attention techniques can greatly relieve this issue though the extent to which frontier labs deploy them is uncertain. Deepseek v3.2 uses sparse attention though I don't know off hand how much this reduces KV cache FLOPs and associated memory bandwidth.

New comment by gchadwick in "No Coding Before 10am"

gchadwick — Sun, 15 Feb 2026 10:35:23 +0000

Anyone else find reading things like this slightly exhausting?

I'm very much pro AI for coding there are clearly significant capabilities there but I'm still getting my head around how to best utilise it.

Posts like these make it sound like ruthlessly optimizing your workflow letting no possible efficiency go every single day is the only way to work now. This has always been possible and generally not a good idea to focus on exclusively. There's always been processes to optimise and automate and always a balance as to which to pursue.

Personally I am incorporating AI into my daily work but not getting too bogged down by it. I read about some of the latest ideas and techniques and choose carefully which I employ. Sometimes I'll try and AI workflow and then abandon it. I recently connected Claude up to draw.io with an MCP, it had some good capabilities but for the specific task I wanted it wasn't really getting it so doing it manually was the better choice to achieve what I wanted in good time.

The models themselves and coding harnesses are also evolving quickly complex workflows people may put together can quickly become pointless.

More haste, less speed as they say!

New comment by gchadwick in "RISC-V Vector Primer"

gchadwick — Thu, 12 Feb 2026 09:23:31 +0000

Only taken a quick skim but this looks like solid material!

RISC-V Vector is definitely tricky to get a handle on, especially if you just read the architecture documentation (which is to be expected really, good specification for an architecture isn't compatible with a useful beginners guide). I found I needed to look at some presentations given by various members of the vector working group to get a good grasp of the principles.

There's been precious little material beyond the specification and some now slightly old slide decks so this is a great contribution.

New comment by gchadwick in "Time Station Emulator"

gchadwick — Wed, 28 Jan 2026 12:36:36 +0000

A shame this hasn't shot to number 1 on HN and stayed there. At least it's getting reasonable upvotes.

This is a truly fantastic piece of hacking, going by the original meaning of the word as used within the dawn of the computer era.

New comment by gchadwick in "Nvidia to buy assets from Groq for $20B cash"

gchadwick — Wed, 24 Dec 2025 22:15:13 +0000

Another example of the growing trend of buying out key parts of a company to avoid any actual acquisition?

I wonder if equity holding employees get anything from the deal or indeed if all the investors will be seeing a return from this?

New comment by gchadwick in "NVIDIA frenemy relation with OpenAI and Oracle"

gchadwick — Mon, 08 Dec 2025 19:50:59 +0000

> However, Groq’s architecture relies on SRAM (Static RAM). Since SRAM is typically built in logic fabs (like TSMC) alongside the processors themselves, it theoretically shouldn't face the same supply chain crunch as HBM.

It's true SRAM comes with your logic, you get a TSMC N3 (or N6 or whatever) wafer, you got SRAM. Unfortunately SRAM just doesn't have the capacity you have to augment with DRAM which you see companies like D-Matrix and Cerebras doing. Perhaps you can use cheaper/more available LPDDR or GDDR (Nvidia have done this themselves with Rubin CPX) but that also has supply issues.

Note it's not really about parameter storage (which you can amortize over multiple users) it's KV cache storage which gets you and that scales with the user count.

Now Groq does appear to be going for a pure SRAM play but if the easily available pure SRAM thing comes at some multiple of the capital cost of the DRAM thing it's not a simple escape hatch from DRAM availability.

New comment by gchadwick in "Backpropagation is a leaky abstraction (2016)"

gchadwick — Sun, 02 Nov 2025 08:58:06 +0000

It's for a CS course at Stanford not a PyTorch boot camp. It seems reasonable to expect some level of academic rigour and need to learn and demonstrate understanding of the fundamentals. If researchers aren't learning the fundamentals in courses like these where are they learning them?

You've also missed the point of the article, if you're building novel model architectures you can't magic away the leakiness. You need to understand the back prop behaviours of the building blocks you use to achieve a good training run. Ignore these and what could be a good model architecture with some tweaks will either entirely fail to train or produce disappointing results.

Perhaps you're working at a level of bolting pre built models together or training existing architectures on new datasets but this course operates below that level to teach you how things actually work.

New comment by gchadwick in "Backpropagation is a leaky abstraction (2016)"

gchadwick — Sun, 02 Nov 2025 07:20:21 +0000

Karpathy's contribution to teaching around deep learning is just immense. He's got a mountain of fantastic material from short articles like this, longer writing like https://karpathy.github.io/2015/05/21/rnn-effectiveness/ (on recurrent neural networks) and all of the stuff on YouTube.

Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.

New comment by gchadwick in "Alibaba's new AI chip: Key specifications comparable to H20"

gchadwick — Wed, 17 Sep 2025 14:58:37 +0000

To me at least "not good after all" means their current latest hardware has issues which means it cannot replace Nvidia GPUs yet. This is a hard problem so not getting there yet doesn't imply bad engineering just a reflection of the scale of the challenge! It also doesn't imply that if this generation is a miss following generations couldn't be large win. Indeed I think it would be very foolish to assume that Alibaba or other Chinese firms cannot build devices that can challenge Nvidia here on the basis of current generation not being up to it yet. As you say they have a large market that's willing to wait for them to become good.

Plus it may not be true, this new Alibaba chip could turn out to be brilliant.

New comment by gchadwick in "Alibaba's new AI chip: Key specifications comparable to H20"

gchadwick — Wed, 17 Sep 2025 13:00:21 +0000

I'd say there's a mix of 'Chinese GPUs are not that good after all' and 'Nvidia doesn't have any magical secret sauce, and China could easily catch up' going on. Nvidia GPUs are indeed remarkable devices with a complex software stack that offers all kinds of possibilities that you cannot replicate over night (or over a year or two!)

However they've also got a fair amount of generality, anything you might want to do that involves huge amounts of matmuls and vector maths you can probably map to a GPU and do a half decent job of it. This is good for things like model research and exploration of training methods.

Once this is all developed you can cherry pick a few specific things to be good at and build your own GPU concentrating on making those specific things work well (such as inference and training on Transformer architectures) and catch up to Nvidia on those aspects even if you cannot beat or match a GPU on every possible task, however you don't care as you only want to do some specific things well.

This is still hard and model architectures and training approaches are continuously evolving. Simplify things too much and target some ultra specific things and you end up with some pretty useless hardware that won't allow you to develop next year's models, nor run this year's particularly well. You can just develop and run last year's models. So you need to hit a sweet spot between enough flexibility to keep up with developments but don't add so much you have to totally replicate what Nvidia have done.

Ultimately the 'secret sauce' is just years of development producing a very capable architecture that offers huge flexibility across differing workloads. You can short-cut that development by reducing flexibility or not caring your architecture is rubbish at certain things (hence no magical secret sauce). This is still hard and your first gen could suck quite a lot (hence not that good after all) but when you've got a strong desire for an alternative hardware source you can probably put up with a lot of short-term pain for the long-term pay off.

New Prefill Specialised GPU – Nvidia Rubin CPX

gchadwick — Wed, 10 Sep 2025 20:37:45 +0000

Article URL: https://semianalysis.com/2025/09/10/another-giant-leap-the-rubin-cpx-specialized-accelerator-rack/

Comments URL: https://news.ycombinator.com/item?id=45203305

Points: 5

# Comments: 0

New comment by gchadwick in "Apple A17 Pro Chip Hardware Flaw?"

gchadwick — Sun, 07 Sep 2025 19:22:35 +0000

If I'm reading this right, glitching the I2C bus prevents the Secure Enclave from booting. It seems the device recovers from this itself 'Although the device recovered and remained operable', maybe the Secure Enclave reboots itself after seeing a fault in the I2C?

No evidence of any security issue is presented. Though it's certainly wanted to drum it as something major 'This is a high-severity, unpatchable design flaw'.

New comment by gchadwick in "The maths you need to start understanding LLMs"

gchadwick — Sat, 06 Sep 2025 20:02:26 +0000

I thought it was a great book, dives into all the details and lays it out step by step with some nice examples. Obviously it's a pretty basic architecture and very simplistic training but I found it gave me the grounding to then understand more complex architectures.