<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: purplesyringa</title><link>https://news.ycombinator.com/user?id=purplesyringa</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 17 Apr 2026 08:35:14 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=purplesyringa" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by purplesyringa in "Mark's Magic Multiply"]]></title><description><![CDATA[
<p>And here are the constants for the ostensibly more useful 54x54->108 product: <a href="https://play.rust-lang.org/?version=stable&mode=release&edition=2024&gist=cbedb2bcf918107fe7565ba2397edda2" rel="nofollow">https://play.rust-lang.org/?version=stable&mode=release&edit...</a><p>I tried to go even higher, but the bounds seems to break at 55 bits.</p>
]]></description><pubDate>Tue, 14 Apr 2026 01:14:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=47760038</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47760038</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47760038</guid></item><item><title><![CDATA[New comment by purplesyringa in "Mark's Magic Multiply"]]></title><description><![CDATA[
<p>I think it fails because it seems like the difference between 32-bit and 64-bit floats is 2x, but in reality we should look at the mantissa, and the increase from 23 bits to 52 bits is much greater.<p>Although I managed to tweak this method to work with 3 multiplications.<p>ETA: I just realized you wanted to use 32x32 -> 64 products, while my approach assumes the existence of 64x64 -> 64 products; basically it's just a scaled-up version of the original question and likely not what you're looking for. Hopefully it's still useful though.<p>First, remove the bottom 8 bits of the two inputs and compute the 44x44->88 product. This can be done with the approach in the post. Then apply the algorithm again, combining that product together with the product of the bottom half of the input to get the full 52x52->104 output. The bounds are a bit tight, but it should work. Here's a numeric example:<p><pre><code>    a = 98a67ee86f8cf
    b = da19d2c9dfe71

    (a >> 20) * (b >> 20)         = 820d2e04637bf428
    (a >> 8) * (b >> 8) % 2**64   =       0547f8cdb2100210
    ->
    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210

    (a >> 8) * (b >> 8)           = 820d2e0547f8cdb2100210
    (a * b) % 2**64               =           080978075f64355f
    ->
    a * b                         = 820d2e0548080978075f64355f
</code></pre>
And my attempt at implementation: <a href="https://play.rust-lang.org/?version=stable&mode=release&edition=2024&gist=56dbb7f56d1e3cf5c4af6a5d5ffa7e06" rel="nofollow">https://play.rust-lang.org/?version=stable&mode=release&edit...</a></p>
]]></description><pubDate>Mon, 13 Apr 2026 19:58:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47757081</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47757081</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47757081</guid></item><item><title><![CDATA[New comment by purplesyringa in "Programming Used to Be Free"]]></title><description><![CDATA[
<p>> Very tangential, but I could swear QBasic included an on-disk documentation system accessible from the editor. Maybe only later versions?<p>Perhaps my installation didn't include it, or maybe you're confusing it with QuickBASIC, a more feature-complete IDE with a compiler (instead of just an interpreter). I don't exactly remember.</p>
]]></description><pubDate>Mon, 13 Apr 2026 16:56:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=47754878</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47754878</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47754878</guid></item><item><title><![CDATA[New comment by purplesyringa in "Mark's Magic Multiply"]]></title><description><![CDATA[
<p>I don't think it's possible to apply this trick to 64-bit floats on 64-bit architecture, which OP mentions in the last sentence. You need a 52 x 52 -> 104 product. Modular 64 x 64 -> 64 multiplication gives you the 64 bottom bits exactly, widening 32 x 32 -> 64 multiplication approximately gives you the top 32 bits. That leaves 104 - 64 - 32 = 8 bits that are not accounted for at all. Compare with the 32-bit case, where the same arithmetic gives 46 - 32 - 16 = -2, i.e. a 2-bit overlap the method relies on.</p>
]]></description><pubDate>Mon, 13 Apr 2026 15:40:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47753626</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47753626</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47753626</guid></item><item><title><![CDATA[New comment by purplesyringa in "Programming Used to Be Free"]]></title><description><![CDATA[
<p>You can still write code without LLMs, much like you can write code without modern IDEs, or use C and assembly instead of higher-level languages. But there are significant differences between the skills you learn in the process, which I believe inhibits upward mobility.</p>
]]></description><pubDate>Mon, 13 Apr 2026 13:24:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=47751628</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47751628</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47751628</guid></item><item><title><![CDATA[New comment by purplesyringa in "All elementary functions from a single binary operator"]]></title><description><![CDATA[
<p>The formulas are provided in the supplementary information file, as mentioned in the paper. <a href="https://arxiv.org/src/2603.21852v2/anc/SupplementaryInformation.pdf" rel="nofollow">https://arxiv.org/src/2603.21852v2/anc/SupplementaryInformat...</a> You want page 9.</p>
]]></description><pubDate>Mon, 13 Apr 2026 06:47:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47748515</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47748515</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47748515</guid></item><item><title><![CDATA[New comment by purplesyringa in "Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets"]]></title><description><![CDATA[
<p>I must admit I'm surprised to see this -- Lemire offhandedly mentioned in the famous remainder blog post (<a href="https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/" rel="nofollow">https://lemire.me/blog/2019/02/08/faster-remainders-when-the...</a>) that 64-bit constants can be used for 32-bit division, and even provided a short example to compute the remainder that way (though not the quotient). Looking a bit more, it seems like libdivide didn't integrate this optimization either.<p>I <i>guess</i> everyone just assumed that this is so well-known now, that compilers have certainly integrated it, but no one actually bothered to submit a patch until now, when it was reinvented?</p>
]]></description><pubDate>Mon, 13 Apr 2026 06:27:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47748329</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47748329</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47748329</guid></item><item><title><![CDATA[New comment by purplesyringa in "Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets"]]></title><description><![CDATA[
<p>The paper doesn't require a bitshift after multiplication -- it directly uses the high half of the product as the quotient, so it saves at least one tick over the solution you mentioned. And on x86, saturating addition can't be done in a tick and 32->64 zero-extension is implicit, so the distinction is even wider.</p>
]]></description><pubDate>Mon, 13 Apr 2026 06:20:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47748291</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47748291</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47748291</guid></item><item><title><![CDATA[New comment by purplesyringa in "Simplest Hash Functions"]]></title><description><![CDATA[
<p>Honorary mention: byte swapping instructions (originally added to CPUs for endianness conversion) can also be used to redistribute entropy, but they're slightly slower than rotations on Intel, which is why I think they aren't utilized much.</p>
]]></description><pubDate>Sun, 12 Apr 2026 10:11:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47737967</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47737967</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47737967</guid></item><item><title><![CDATA[New comment by purplesyringa in "Simplest Hash Functions"]]></title><description><![CDATA[
<p>I think the reason real-world implementations don't do this is to speed up access when the key is a small integer. Say, if your IDs are spread uniformly between 1 and 1000, taking the bottom 7 bits is a great hash, while the top 7 bits would just be zeros. So it's optimizing for a trivial hash rather than a general-purpose fast hash.<p>And since most languages require each data type to provide its own hash function, you kind of have to assume that the hash is half-assed and bottom bits are better. I think only Rust could make decisions differently here, since it's parametric over hashers, but I haven't seen that done.</p>
]]></description><pubDate>Sun, 12 Apr 2026 09:11:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47737529</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47737529</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47737529</guid></item><item><title><![CDATA[New comment by purplesyringa in "Simplest Hash Functions"]]></title><description><![CDATA[
<p>I work on the machine code level, so the only characteristic I'm interested in is how many ticks it takes to compute the result, not how many transistors it requires or anything like that. All modern CPUs take 1 tick to compute both XOR, addition, and many other simple arithmetic operations, so even though addition is technically more complicated in CPU designs, it never surfaces in software. In the context of this post, I preferred addition instead of XOR to reduce cancel-out and propagate entropy between bits.</p>
]]></description><pubDate>Sun, 12 Apr 2026 09:04:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47737492</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47737492</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47737492</guid></item><item><title><![CDATA[New comment by purplesyringa in "Simplest Hash Functions"]]></title><description><![CDATA[
<p>Yes, that's my point. It's not true that <i>all</i> hash functions have this characteristic, but most fast ones do. (And if you're using a slow-and-high-quality hash function, the distinction doesn't matter, so might as well use top bits.)</p>
]]></description><pubDate>Sun, 12 Apr 2026 09:01:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=47737473</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47737473</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47737473</guid></item><item><title><![CDATA[New comment by purplesyringa in "Simplest Hash Functions"]]></title><description><![CDATA[
<p>Djb2 is hardly a proven good hash :) It's really easy to find collisions for it, and it's not seeded, so you're kind of screwed regardless. It's the odd middle ground between "safely usable in practice" and "fast in practice", which turns out to be "neither safe nor fast" in this case.</p>
]]></description><pubDate>Sun, 12 Apr 2026 08:59:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47737468</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47737468</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47737468</guid></item><item><title><![CDATA[An alternative derivation of Shannon entropy]]></title><description><![CDATA[
<p>Article URL: <a href="https://iczelia.net/posts/shannon-deriv/">https://iczelia.net/posts/shannon-deriv/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47431869">https://news.ycombinator.com/item?id=47431869</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 18 Mar 2026 21:51:15 +0000</pubDate><link>https://iczelia.net/posts/shannon-deriv/</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47431869</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47431869</guid></item><item><title><![CDATA[New comment by purplesyringa in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>"nobody cares about BigInt addition performance" is an odd claim to make when half of the world's cryptography is based on ECC.</p>
]]></description><pubDate>Mon, 16 Mar 2026 21:38:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47405284</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47405284</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47405284</guid></item><item><title><![CDATA[New comment by purplesyringa in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>I suspect that LLVM is optimized for compiling with `-ftrapv`, perhaps for cheap sanitizing or maybe just due to design decisions like using unsigned integers everywhere (please correct me if I'm wrong). I'm personally interested in how RISC-V behaves on computational tasks where computing carry is a known bottleneck, like long addition. Maybe looking at libgmp could be interesting, though I suspect absolute numbers will not be meaningful, and there's no baseline to compare them to.</p>
]]></description><pubDate>Thu, 12 Mar 2026 05:40:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47346922</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=47346922</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47346922</guid></item><item><title><![CDATA[New comment by purplesyringa in "PNG in Chrome shows a different image than in Safari or any desktop app"]]></title><description><![CDATA[
<p>I was wondering why, in my Firefox, the image appears saturated when embedded on the website, but opening it in a new tab by a direct URL shows an unsaturated version. The `img` tag on the website seems to be styled with `mix-blend-mode: multiply`, which makes the image darker because the background is #f0f0f0.</p>
]]></description><pubDate>Sat, 27 Dec 2025 20:54:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=46405135</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=46405135</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46405135</guid></item><item><title><![CDATA[New comment by purplesyringa in "I'm just having fun"]]></title><description><![CDATA[
<p>It's "they" (<a href="https://github.com/jyn514" rel="nofollow">https://github.com/jyn514</a>)</p>
]]></description><pubDate>Mon, 22 Dec 2025 16:37:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=46355618</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=46355618</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46355618</guid></item><item><title><![CDATA[New comment by purplesyringa in "The Number That Turned Sideways"]]></title><description><![CDATA[
<p>You can: the equation x^2 = x holds for 1, but not for -1, so you can separate them. There is no way to write an equation without mentioning i (excluding cheating with Im, which again can't be defined without knowing i) that holds for i, but not -i.</p>
]]></description><pubDate>Thu, 18 Dec 2025 07:33:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=46309887</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=46309887</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46309887</guid></item><item><title><![CDATA[New comment by purplesyringa in "Go Proposal: Secret Mode"]]></title><description><![CDATA[
<p>I meeeeean... plenty of functions allocate internally and don't let the user pass in an allocator. So it's not clear to me how to do this at least somewhat universally. You could try to integrate it into the global allocator, I suppose, but then how do you know which allocations to wipe? Should anything allocated in the secret mode be zeroed on free? Or should anything be zeroed if the deallocation happens while in secret mode? Or are both of these necessary conditions? It seems tricky to define rigidly.<p>And stack's the main problem, yeah. It's kind of the main reason why zeroing registers is not enough. That and inter-procedural optimizations.</p>
]]></description><pubDate>Sun, 14 Dec 2025 09:24:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=46261861</link><dc:creator>purplesyringa</dc:creator><comments>https://news.ycombinator.com/item?id=46261861</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46261861</guid></item></channel></rss>