Hacker News: cwzwarich

New comment by cwzwarich in "AMD's Ryzen 9 9950X3D2 Dual Edition crams 208MB of cache into a single chip"

cwzwarich — Sat, 28 Mar 2026 03:39:40 +0000

https://github.com/coreboot/coreboot/blob/main/src/soc/intel...

New comment by cwzwarich in "The post-GeForce era: What if Nvidia abandons PC gaming?"

cwzwarich — Sat, 20 Dec 2025 16:53:30 +0000

Nvidia doesn't share dies between their high-end datacenter products like B200 and consumer products. The high-end consumer dies have many more SMs than a corresponding datacenter die. Each has functionality that the other does not within an SM/TPC, nevermind the very different fabric and memory subsystem (with much higher bandwidth/SM on the datacenter parts). They run at very different clock frequencies. It just wouldn't make sense to share the dies under these constraints, especially when GPUs already present a fairly obvious yield recovery strategy.

New comment by cwzwarich in "TPUs vs. GPUs and why Google is positioned to win AI race in the long term"

cwzwarich — Thu, 27 Nov 2025 18:43:17 +0000

Isn’t the 9000 TFLOP/s number Nvidia’s relatively useless sparse FLOP count that is 2x the actual dense FLOP count?

New comment by cwzwarich in "Why don't you use dependent types?"

cwzwarich — Sun, 02 Nov 2025 17:43:05 +0000

Adding axioms to simple type theory is more awkward than adding them to a set theory like ZFC. One approach to universes I’ve seen in Isabelle/HOL world is to postulate the existence of a universe as a model of set theory. But then you’re stuck reasoning semantically about a model of set theory. Nobody has scaled this up to a large pluralist math library like Mathlib.

New comment by cwzwarich in "Why don't you use dependent types?"

cwzwarich — Sun, 02 Nov 2025 16:22:13 +0000

The bigger problem with HOL (or simple type theory) is not the lack of dependencies, but rather the lack of logical strength. Simple type theory is equivalent in logical strength to bounded Zermelo set theory (i.e. ZF without Foundation or Replacement, and with Separation restricted to formulas with bounded quantifiers). This is unfortunately too weak to formalize post-WW2 mathematics in the same style as is done by ordinary mathematicians. Similarly, it does not offer a great way to deal with the size issues that arise in e.g. category theory.

New comment by cwzwarich in "Hard Rust requirements from May onward"

cwzwarich — Sat, 01 Nov 2025 19:53:47 +0000

The original purpose of the C standard was to solve the problems created by the diversity of increasingly divergent implementations of C. They studied existing behavior across systems, proposed new language constructs, and it was generally a success (look at the proliferation of C in the 90s across many different systems and architectures).

The actual informal semantics in the standard and its successors is written in an axiomatic (as opposed to operational or denotational) style, and is subject to the usual problem of axiomatic semantics: one rule you forgot to read can completely change the meaning of the other rules you did read. There are a number of areas known to be ill-specified in the standard, with the worst probably being the implications of the typed memory model. There have since been formalized semantics of C, which are generally less general than the informal version in the standard and make some additional assumptions.

C++ tried to follow the same model, but C++ is orders of magnitude more complex than C and thus the standard is overall less well specified than the C++ standard (e.g. there is still no normative list of all the undefined behavior in C++). It is likely practically impossible to write a formal specification for C++. Still, essentially all of the work on memory models for low-level programming languages originates in the context of C++ (and then ported back to C and Rust).

New comment by cwzwarich in "Hard Rust requirements from May onward"

cwzwarich — Sat, 01 Nov 2025 16:42:41 +0000

The Rust specification you link is performative and only intended to satisfy requirements of certification processes. No one is actually using it to implement the language, as far as I am aware.

There is other work on specifying Rust (e.g. the Unsafe Code Guidelines Working Group), but nothing approaching a real spec for the whole language. Honestly, it is probably impossible at this point; Rust has many layers of implementation-defined hidden complexities.

New comment by cwzwarich in "Apple will phase out Rosetta 2 in macOS 28"

cwzwarich — Tue, 28 Oct 2025 17:49:01 +0000

My reasons for leaving Apple had nothing to do with this decision. I was already no longer working on Rosetta 2 in a day-to-day capacity, although I would still frequently chat with the team and give input on future directions.

New comment by cwzwarich in "Compiling with Continuations"

cwzwarich — Sat, 20 Sep 2025 13:31:19 +0000

CPS is fairly dead as an IR, but the (local) CPS transform seems more popular than ever with languages implementing "stackless" control effects.

As far as functional IRs go, I would say SSA corresponds most directly to (a first-order restriction of) ANF w/ join points. The main difference being the use of dominance-based scoping rules, which is certainly more convenient than juggling binders when writing transformations. The first-order restriction isn't even essential, e.g. https://compilers.cs.uni-saarland.de/papers/lkh15_cgo.pdf.

If you're interested in an IR that can express continuations (or evaluation contexts) in a first-class way, a much better choice than CPS is an IR based on the sequent calculus. As I'm sure you know (since you work with one of the coauthors), this was first experimented with in a practical way in GHC (https://pauldownen.com/publications/scfp_ext.pdf), but there is a paper in this year's OOPSLA (https://dl.acm.org/doi/10.1145/3720507) that explores this more fundamentally, without the architectural constraint of being compatible with all of the other decisions already made in GHC. Of course, one could add dominance scoping etc. to make an extension of SSA with more symmetry between producers and consumers like the classical sequent calculus.

New comment by cwzwarich in "Compiling with Continuations"

cwzwarich — Sat, 20 Sep 2025 13:15:07 +0000

> Call/ret instructions work really well with branch predictors. Lots of jumps (to continuations) might not work quite as well.

On x86, the use of paired call/return is conflated with usage of the stack to store activation records. On AArch64, indirect branches can be marked with a hint bit indicating that they are a call or return, so branch prediction can work exactly the same with activation records elsewhere.

New comment by cwzwarich in "Advanced Scheme Techniques (2004) [pdf]"

cwzwarich — Sat, 13 Sep 2025 01:27:30 +0000

There's an OOPSLA paper (referred to from the link starting that thread) from this year with 2 of the same authors that goes into more detail about using it as a compiler IR: https://dl.acm.org/doi/10.1145/3720507

New comment by cwzwarich in "The future of 32-bit support in the kernel"

cwzwarich — Mon, 01 Sep 2025 19:55:10 +0000

It shouldn’t be difficult to write a binary translator to run 32-bit executables on a 64-bit userspace. You will take a small performance hit (on top of the performance hit of using the 32-bit architecture to begin with), but that should be fine for anything old enough to not be recompiled.

New comment by cwzwarich in "Typechecker Zoo"

cwzwarich — Mon, 18 Aug 2025 21:20:10 +0000

Yes, in fact Lean proves the law of the excluded middle using Diaconescu's theorem rather than assuming it as an independent axiom:

https://github.com/leanprover/lean4/blob/ad1a017949674a947f0...

New comment by cwzwarich in "Typechecker Zoo"

cwzwarich — Mon, 18 Aug 2025 17:08:47 +0000

Lean’s type theory extends CIC with the (global) axiom of choice, which increases consistency strength over base CIC.

New comment by cwzwarich in "Load-Store Conflicts"

cwzwarich — Sun, 04 May 2025 19:30:01 +0000

Interesting! IIRC, the LLVM passes dedicated to dodging this issue were contributed by Intel engineers, so maybe there’s some bias.

New comment by cwzwarich in "Load-Store Conflicts"

cwzwarich — Sun, 04 May 2025 19:13:52 +0000

> This is a significant problem on AMD; Intel and Apple seems to be better.

When did this change? In my testing years ago (while I was writing Rosetta 2, so Icelake-era Intel), Intel only allowed a load to forward from a single store, and no partial forwarding (i.e. mixed cache/register) without a huge penalty, whereas AMD at least allowed partial forwarding (or had a considerably lower penalty than Intel).

New comment by cwzwarich in "Less Slow C++"

cwzwarich — Fri, 18 Apr 2025 16:46:12 +0000

Intel has always had terrible subnormal performance. It's not that difficult to implement in HW, and even if you still want to optimize for the normalized case, we're talking about a 1 cycle penalty, not an order(s)-of-magnitude penalty.

New comment by cwzwarich in "C and C++ prioritize performance over correctness (2023)"

cwzwarich — Mon, 31 Mar 2025 19:49:36 +0000

> it's the natural implementation in hardware

The natural implementation in hardware is that addition of two N-bit numbers produces an N+1-bit number. Most architectures even expose this extra bit as a carry bit.

New comment by cwzwarich in "The inevitability of the borrow checker"

cwzwarich — Sat, 08 Feb 2025 02:54:17 +0000

Cyclone had borrowing. See Section 4.4 (Temporary Aliasing) in the paper

https://homes.cs.washington.edu/~djg/papers/cyclone_memory.p...

or the more detailed discussion throughout this journal paper:

https://homes.cs.washington.edu/~djg/papers/cyclone_scp.pdf

As their citations indicate, the idea of borrowing appeared immediately in the application of subtructural logics to programming languages, back to Wadler's "Linear types can change the world!". It's just too painful without it.

New comment by cwzwarich in "Rosetta 2 creator leaves Apple to work on Lean full-time"

cwzwarich — Sun, 22 Dec 2024 21:04:22 +0000

Thanks. It means a lot coming from someone with experience in our niche field.