Hacker News: purplesyringa

New comment by purplesyringa in "WASM is not quite a stack machine"

purplesyringa — Tue, 28 Apr 2026 18:09:19 +0000

My brain does not understand how one can see `(+ a b)` in text and read it as `a b +`. I suppose you can argue that moving operators to the right and removing all parentheses converts `(+ a b)` to `a b +`, sure, but that applies to pretty much any language -- next thing you'll tell me is the JavaScript expression `f(1, g(2), h(3))` actually uses stack and is just syntactic sugar for `1 2 g 3 h f`.

> It is explicity sugar for the stack operations, per my reading of the spec.

The entire Wasm text format is syntax sugar for binary Wasm, so this is kind of a vacuous truth -- if Wasm's spec says it's a stack machine, of course everything is sugar for a stack machine. But if you weren't aware of Wasm and just saw a program in textual Wasm for the first time, I don't think the idea that it's stack-based would cross your mind.

New comment by purplesyringa in "WASM is not quite a stack machine"

purplesyringa — Tue, 28 Apr 2026 11:43:15 +0000

There used to be hype about Wasm, now it's a technology as any other. It's still used, and used a lot; it just doesn't get focused on as much.

New comment by purplesyringa in "WASM is not quite a stack machine"

purplesyringa — Tue, 28 Apr 2026 11:41:24 +0000

> tl;dr: the LISP syntax is just syntax sugar. The textual format is as "stack-like" as the binary format.

Not that you're technically wrong, but I think you're begging the question.

Stack-based languages/encodings, in a colloquial sense, are equated to postfix notation, e.g. `a b +` instead of the infix `a + b`. Both LISP and textual Wasm use prefix notation, e.g. `(+ a b)`. Neither of the three is any more foundational than the other -- all notations can encode all expression trees, and postfix and prefix notations in particular have the same coding efficiency.

So sure, the LISP syntax is sugar, but for what? It's not sugar for a stack program, because prefix notation in general can't represent an arbitrary stack program; it's sugar for a mathematical expression. Which is encoded in postfix notation in binary, sure, but that's just an implementation detail, and prefix notation could've been selected when Wasm was born with little adversarial consequences.

New comment by purplesyringa in "It's OK to compare floating-points for equality"

purplesyringa — Sat, 18 Apr 2026 21:17:44 +0000

There's just so many "but"s to this that I can't in good faith recommend people to treat floats as deterministic, even though I'd very much love to do so (and I make such assumptions myself, caveat emptor):

- NaN bits are non-deterministic. x86 and ARM generate different sign bits for NaNs. Wasm says NaN payloads are completely unpredictable.

- GPUs don't give a shit about IEEE-754 and apply optimizations raging from DAZ to -ffast-math.

- sin, rsqrt, etc. behave differently when implemented by different libraries. If you're linking libm for sin, you can get different implementations depending on the libc in use. Or you can get different results on different hardware.

- C compilers are allowed to "optimize" a * b + c to FMA when they wish to. The standard only technically allows this merge within one expression, but GCC enables this in all cases by default on some `-std`s.

You're technically correct that floats can be used right, but it's just impossible to explain to a layman that, yes, floats are fine on CPUs, but not on GPUs; fine if you're doing normal arithmetic and sqrt, but not sin or rsqrt; fine on modern compilers, but not old ones; fine on x86, but not i686; fine if you're writing code yourself, but not if you're relying on linear algebra libraries, unless of course you write `a * b + c` and compile with the wrong options; fine if you rely on float equality, but not bitwise equality; etc. Everything is broken and the entire thing is a mess.

New comment by purplesyringa in "Mark's Magic Multiply"

purplesyringa — Tue, 14 Apr 2026 01:14:05 +0000

And here are the constants for the ostensibly more useful 54x54->108 product: https://play.rust-lang.org/?version=stable&mode=release&edit...

I tried to go even higher, but the bounds seems to break at 55 bits.

New comment by purplesyringa in "Mark's Magic Multiply"

purplesyringa — Mon, 13 Apr 2026 19:58:35 +0000

I think it fails because it seems like the difference between 32-bit and 64-bit floats is 2x, but in reality we should look at the mantissa, and the increase from 23 bits to 52 bits is much greater.

Although I managed to tweak this method to work with 3 multiplications.

ETA: I just realized you wanted to use 32x32 -> 64 products, while my approach assumes the existence of 64x64 -> 64 products; basically it's just a scaled-up version of the original question and likely not what you're looking for. Hopefully it's still useful though.

First, remove the bottom 8 bits of the two inputs and compute the 44x44->88 product. This can be done with the approach in the post. Then apply the algorithm again, combining that product together with the product of the bottom half of the input to get the full 52x52->104 output. The bounds are a bit tight, but it should work. Here's a numeric example:

    a = 98a67ee86f8cf
    b = da19d2c9dfe71

    (a >> 20) * (b >> 20)         = 820d2e04637bf428
    (a >> 8) * (b >> 8) % 2**64   =       0547f8cdb2100210
    ->
    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210

    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210
    (a * b) % 2**64               =           080978075f64355f
    ->
    a * b                         = 820d2e0548080978075f64355f

And my attempt at implementation: https://play.rust-lang.org/?version=stable&mode=release&edit...

New comment by purplesyringa in "Programming Used to Be Free"

purplesyringa — Mon, 13 Apr 2026 16:56:50 +0000

> Very tangential, but I could swear QBasic included an on-disk documentation system accessible from the editor. Maybe only later versions?

Perhaps my installation didn't include it, or maybe you're confusing it with QuickBASIC, a more feature-complete IDE with a compiler (instead of just an interpreter). I don't exactly remember.

New comment by purplesyringa in "Mark's Magic Multiply"

purplesyringa — Mon, 13 Apr 2026 15:40:11 +0000

I don't think it's possible to apply this trick to 64-bit floats on 64-bit architecture, which OP mentions in the last sentence. You need a 52 x 52 -> 104 product. Modular 64 x 64 -> 64 multiplication gives you the 64 bottom bits exactly, widening 32 x 32 -> 64 multiplication approximately gives you the top 32 bits. That leaves 104 - 64 - 32 = 8 bits that are not accounted for at all. Compare with the 32-bit case, where the same arithmetic gives 46 - 32 - 16 = -2, i.e. a 2-bit overlap the method relies on.

New comment by purplesyringa in "Programming Used to Be Free"

purplesyringa — Mon, 13 Apr 2026 13:24:36 +0000

You can still write code without LLMs, much like you can write code without modern IDEs, or use C and assembly instead of higher-level languages. But there are significant differences between the skills you learn in the process, which I believe inhibits upward mobility.

New comment by purplesyringa in "All elementary functions from a single binary operator"

purplesyringa — Mon, 13 Apr 2026 06:47:15 +0000

The formulas are provided in the supplementary information file, as mentioned in the paper. https://arxiv.org/src/2603.21852v2/anc/SupplementaryInformat... You want page 9.

New comment by purplesyringa in "Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets"

purplesyringa — Mon, 13 Apr 2026 06:27:12 +0000

I must admit I'm surprised to see this -- Lemire offhandedly mentioned in the famous remainder blog post (https://lemire.me/blog/2019/02/08/faster-remainders-when-the...) that 64-bit constants can be used for 32-bit division, and even provided a short example to compute the remainder that way (though not the quotient). Looking a bit more, it seems like libdivide didn't integrate this optimization either.

I guess everyone just assumed that this is so well-known now, that compilers have certainly integrated it, but no one actually bothered to submit a patch until now, when it was reinvented?

New comment by purplesyringa in "Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets"

purplesyringa — Mon, 13 Apr 2026 06:20:51 +0000

The paper doesn't require a bitshift after multiplication -- it directly uses the high half of the product as the quotient, so it saves at least one tick over the solution you mentioned. And on x86, saturating addition can't be done in a tick and 32->64 zero-extension is implicit, so the distinction is even wider.

New comment by purplesyringa in "Simplest Hash Functions"

purplesyringa — Sun, 12 Apr 2026 10:11:48 +0000

Honorary mention: byte swapping instructions (originally added to CPUs for endianness conversion) can also be used to redistribute entropy, but they're slightly slower than rotations on Intel, which is why I think they aren't utilized much.

New comment by purplesyringa in "Simplest Hash Functions"

purplesyringa — Sun, 12 Apr 2026 09:11:09 +0000

I think the reason real-world implementations don't do this is to speed up access when the key is a small integer. Say, if your IDs are spread uniformly between 1 and 1000, taking the bottom 7 bits is a great hash, while the top 7 bits would just be zeros. So it's optimizing for a trivial hash rather than a general-purpose fast hash.

And since most languages require each data type to provide its own hash function, you kind of have to assume that the hash is half-assed and bottom bits are better. I think only Rust could make decisions differently here, since it's parametric over hashers, but I haven't seen that done.

New comment by purplesyringa in "Simplest Hash Functions"

purplesyringa — Sun, 12 Apr 2026 09:04:19 +0000

I work on the machine code level, so the only characteristic I'm interested in is how many ticks it takes to compute the result, not how many transistors it requires or anything like that. All modern CPUs take 1 tick to compute both XOR, addition, and many other simple arithmetic operations, so even though addition is technically more complicated in CPU designs, it never surfaces in software. In the context of this post, I preferred addition instead of XOR to reduce cancel-out and propagate entropy between bits.

New comment by purplesyringa in "Simplest Hash Functions"

purplesyringa — Sun, 12 Apr 2026 09:01:32 +0000

Yes, that's my point. It's not true that all hash functions have this characteristic, but most fast ones do. (And if you're using a slow-and-high-quality hash function, the distinction doesn't matter, so might as well use top bits.)

New comment by purplesyringa in "Simplest Hash Functions"

purplesyringa — Sun, 12 Apr 2026 08:59:37 +0000

Djb2 is hardly a proven good hash :) It's really easy to find collisions for it, and it's not seeded, so you're kind of screwed regardless. It's the odd middle ground between "safely usable in practice" and "fast in practice", which turns out to be "neither safe nor fast" in this case.

An alternative derivation of Shannon entropy

purplesyringa — Wed, 18 Mar 2026 21:51:15 +0000

Article URL: https://iczelia.net/posts/shannon-deriv/

Comments URL: https://news.ycombinator.com/item?id=47431869

Points: 4

# Comments: 0

New comment by purplesyringa in "RISC-V Is Sloooow"

purplesyringa — Mon, 16 Mar 2026 21:38:04 +0000

"nobody cares about BigInt addition performance" is an odd claim to make when half of the world's cryptography is based on ECC.

New comment by purplesyringa in "RISC-V Is Sloooow"

purplesyringa — Thu, 12 Mar 2026 05:40:26 +0000

I suspect that LLVM is optimized for compiling with `-ftrapv`, perhaps for cheap sanitizing or maybe just due to design decisions like using unsigned integers everywhere (please correct me if I'm wrong). I'm personally interested in how RISC-V behaves on computational tasks where computing carry is a known bottleneck, like long addition. Maybe looking at libgmp could be interesting, though I suspect absolute numbers will not be meaningful, and there's no baseline to compare them to.