<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: dzaima</title><link>https://news.ycombinator.com/user?id=dzaima</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 13 Apr 2026 20:57:42 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=dzaima" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by dzaima in "Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets"]]></title><description><![CDATA[
<p>Unfortunately, that's only vector, and ≤16-bit ints at that, no 32-bit ints; and as the other reply says, nearly non-existent multiply-high which generally makes vectorized div-by-const its own mini-hell (but doing a 2x-width multiply with fixups is still better than the OP 4x-width method).<p>(...though, x86 does have (v)pmulhw for 16-bit input, so for 16-bit div-by-const the saturating option works out quite well.)<p>(And, for what it's worth, the lack of 8-bit multiplies on x86 means that the OP method of high-half-of-4x-width-multiply works out nicely for vectorizing dividing 8-bit ints too)</p>
]]></description><pubDate>Mon, 13 Apr 2026 10:54:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47750255</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47750255</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47750255</guid></item><item><title><![CDATA[New comment by dzaima in "Git commands I run before reading any code"]]></title><description><![CDATA[
<p>jj's template and revset languages are very simple syntactically, so once you're comfortable with the few things you do use often it's just a question of learning about the other existing functions (even if only enough to know to look them up), which slot right in and compose well with everything else you know (unlike flags which typically have each their own system).<p>Or, perhaps better yet, defining your own functions/helpers as you go for things you might care about, which, by virtue of having been named you, are much easier to remember (and still compose nicely).</p>
]]></description><pubDate>Wed, 08 Apr 2026 16:22:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47692367</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47692367</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47692367</guid></item><item><title><![CDATA[New comment by dzaima in "Two studies in compiler optimisations"]]></title><description><![CDATA[
<p>> It surprises me that the compiler doesn't still take the inference from the assert and just disable emitting the code to perform the check.<p>That's because that's what the <assert.h> assert() must do; it's specified to do and imply nothing when assertions are disabled. (the standard literally fully defines it as `#define assert(...) ((void)0)` when NDEBUG)<p>Whereas `[[assume(...)]]` is a thing specifically for that "infer things from this without actually emitting any code".</p>
]]></description><pubDate>Thu, 26 Mar 2026 11:59:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47529397</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47529397</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47529397</guid></item><item><title><![CDATA[New comment by dzaima in "The future of version control"]]></title><description><![CDATA[
<p>In git if you, say, do some `git rebase -i`, edit some commit, continue the rebase, and hit a conflict, and realize you edited something wrong that caused the conflict, your only option is aborting the entire rebase and starting over and rebuilding all changes you did.<p>In jj, you just have a descending conflict, and if you edit the past to no longer conflict the conflict disappears; kinda as if you were always in interactive rebase but at all points have the knowledge of what future would look like if you `git rebase --continue`d.<p>Also really nice for reordering commits which can result in conflicts, but leaves descendants non-conflicting, allowing delaying resolving the conflicts after doing other stuff, or continuing doing some reordering instead of always starting from scratch as with `git rebase -i`.</p>
]]></description><pubDate>Sun, 22 Mar 2026 22:16:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47482803</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47482803</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47482803</guid></item><item><title><![CDATA[New comment by dzaima in "The future of version control"]]></title><description><![CDATA[
<p>But you do have the op log, giving you a full copy of the log (incl. the contents of the workspace) at every operation, so you can get out of such mistakes with some finagling.<p>You can choose to have a workflow where you're never directly editing any commit to "gain back autonomy" of the working copy; and if you really want to, with some scripting, you can even emulate a staging area with a specially-formatted commit below the working copy commit.</p>
]]></description><pubDate>Sun, 22 Mar 2026 19:57:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=47481454</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47481454</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47481454</guid></item><item><title><![CDATA[New comment by dzaima in "Google details new 24-hour process to sideload unverified Android apps"]]></title><description><![CDATA[
<p>> as it's an app from a verified developer.<p>Well that's if they go through the verification process, which does not seem like a thing they'd want to do - <a href="https://f-droid.org/en/2026/02/24/open-letter-opposing-developer-verification.html" rel="nofollow">https://f-droid.org/en/2026/02/24/open-letter-opposing-devel...</a></p>
]]></description><pubDate>Thu, 19 Mar 2026 23:55:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47448270</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47448270</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47448270</guid></item><item><title><![CDATA[New comment by dzaima in "The unlikely story of Teardown Multiplayer"]]></title><description><![CDATA[
<p>As far as I know, the ARM (at least aarch64) situation should be about the same as x86-64. Anything specific that's bad about it? (there's aarch32 NEON with no subnormal support or whatever, but you can just not use it if determinism is the goal)<p>that RECIP14 link is AVX-512, i.e. not available on a bunch of hardware (incl. the newest Intel client CPUs), so you wouldn't ever use it in a deterministic-simulation multiplayer game anyway, even if you restrict yourself to x86-64-only; so you're still stuck to the basic IEEE-754 ops even on x86-64.<p>x86-64 is worse than aarch64 is a very important aspect - baseline x86-64 doesn't have fused multiply-add, whereas aarch64 does (granted, the x86-64 FMA extension came out around not far from aarch64/armv8, but it's still a concern, such is life). Of course you can choose to not use fma, but that's throwing perf away. (regardless you'll want -ffp-contract=off or equivalent to make sure compiler optimizations don't screw things up, so any such will need to be manual fma calls anyway)</p>
]]></description><pubDate>Tue, 17 Mar 2026 13:52:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=47412692</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47412692</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47412692</guid></item><item><title><![CDATA[New comment by dzaima in "Prefix sums at gigabytes per second with ARM NEON"]]></title><description><![CDATA[
<p>> This means that all medium price or high price smartphones that were introduced during the last 4 years have SVE2 support.<p>Except Qualcomm chipsets, which disable SVE even if all ARM cores used support it. ("Snapdragon 8 Elite Gen 5" supposedly finally supports SVE? but that's like only half a year old)</p>
]]></description><pubDate>Fri, 13 Mar 2026 12:06:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=47363315</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47363315</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47363315</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>Indeed we shall hope heuristics update; but of course if no compilers emit it hardware has no reason to actually bother making fast misaligned ops, so it's primed for going wrong.</p>
]]></description><pubDate>Thu, 12 Mar 2026 13:33:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47350321</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47350321</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47350321</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>> No, it was released to customers in June 2021, almost five years ago.<p>Ah, okay. (still, like, at least a couple decades newer than the last x86-64 chip with slow unaligned mem ops, if such ever existed at all? Haven't heard of / can't find anything saying any aarch64 ever had problems with them either, so still much worse for the RISC-V side).<p>Well, I suppose we can hope that business/politics messes will all never happen again and won't affect anything RVA23.</p>
]]></description><pubDate>Thu, 12 Mar 2026 13:33:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47350311</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47350311</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47350311</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>P550 is, like, what, only a year old? I suppose there has been some laughing at it at least.<p>Also Kendryte K230 / C908, but only on vector mem ops, which adds a whole another mess onto this.<p>I'd <i>hope</i> all the massive OoO will have fast misaligned mem ops, anything else would immediately cause infinite pain for decades.<p>But of course there'll be plenty of RVA23 hardware that's much smaller eventually too, once it becomes a general expectation instead of "cool thing for the very-top-end to have".<p>I do agree that it'd be reasonable to just assume fast misaligned ops, but for whatever reason gcc and clang just don't, and that's what we have for defaults.</p>
]]></description><pubDate>Wed, 11 Mar 2026 23:54:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=47344265</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47344265</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47344265</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>Of course that'd result in entirely-avoidable slowdown for the potentially-misaligned ops. Perhaps fine for a program that doesn't use them frequently, but quite bad for ones that need misaligned ops everywhere.<p>In terms of correctness, there's also the possibility of partially-misaligned ops (e.g. an 8B load with 4B alignment, loading two adjacent int32_t fields) so you're not handling everything with correct faults anyways.</p>
]]></description><pubDate>Wed, 11 Mar 2026 17:34:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47338611</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47338611</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47338611</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>2 is basically infeasible with RISC-V being intended for a wide range of use-cases. 1 might be ok but introduces a bunch of opcode space waste.<p>Indeed extremely sad that Zicclsm wasn't a thing in the spec, from the very start (never mind that even now it only lives in the profiles spec); going through the git history, seems that the text around misaligned handling optionality goes all the way back to the very start of the riscv/riscv-isa-manual repo, before `Z*` extensions existed at all.<p>More broadly, it's rather sad that there aren't similar extensions for other forms of optional behavior (thing that was recently brought up is RVV vsetvli with e.g. `e64,mf2`, useful for massive-VLEN>DLEN hardware).</p>
]]></description><pubDate>Wed, 11 Mar 2026 14:44:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=47336251</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47336251</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47336251</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>Yeah, that is quite funky; and indeed gcc does that. Relatedly, super-annoying is that `vle64.v` & co could then also make use of that same hardware, but that's not guaranteed. (I suppose there could be awful hardware that does vle8.v via single-byte loads, which wouldn't translate to vle64.v?)</p>
]]></description><pubDate>Wed, 11 Mar 2026 13:00:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47335004</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47335004</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47335004</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>There is Zmmul for multiplication-but-not-divide.</p>
]]></description><pubDate>Wed, 11 Mar 2026 12:57:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47334979</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47334979</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47334979</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>Well, we don't necessarily have to wait for Oilsm; software that wants to could just choose to be opinionated and run massively-worse on suboptimal hardware. And, of course, once Oilsm hardware becomes the standard, it'd be fine to recompile RVA23-targeting software to it too.<p>> RISC-V could've easily avoided all this mess by properly mandating misaligned pointer handling as part of the I extension.<p>Rather hard to mandate performance by an open ISA. Especially considering that there could actually be scenarios where it may be necessary to chicken-bit it off; and of course the fact that there's already some questionability on ops crossing pages, where even ARM/x86 are very slow.</p>
]]></description><pubDate>Wed, 11 Mar 2026 11:36:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47334284</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47334284</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47334284</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>> How is that different for RISC-V?<p>RISC-V hardware with slow misaligned mem ops does exist to non-insignificant extent, and it seems not enough people have laughed at them, and instead compilers did just surrender and default to not using them.<p>> As you observed there's a feedback loop between what compilers output and what gets optimised in hardware.<p>Well, that loop needs to start somewhere, and it has already started, and started wrong. I suppose we'll see what happens with real RVA23 hardware; at the very least, even if it takes a decade for most hardware to support misaligned well, software could retroactively change its defaults while still remaining technically-RVA23-compatible, so I suppose that's good.</p>
]]></description><pubDate>Wed, 11 Mar 2026 10:50:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47333949</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47333949</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47333949</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>I don't think x86/ARM particularly guarantee fastness, but at least they effectively encourage making use of them via their contributions to compilers that do. They also don't really need to given that they mostly control who can make hardware anyway. (at the very least, if general-purpose HW with horribly-slow misaligned loads/stores came out from them, people would laugh at it, and assume/hope that that's because of some silicon defect requiring chicken-bit-ing it off, instead of just not bothering to implement it)<p>Indeed one can make any instruction take basically-forever, but I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation (on at least some inputs), else having said instruction is strictly-redundant.<p>And if any significant general-purpose hardware actually did a 10k-cycle div around the time the respective compiler defaults were decided, I think there's a good chance that software would have defaulted to calling division through a function such that an implementation can be picked depending on the running hardware. (let's ignore whether 10k-cycle-division and general-purpose-hardware would ever go together... but misaligned-mem-ops+general-purpose-hardware definitely do)</p>
]]></description><pubDate>Wed, 11 Mar 2026 09:33:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=47333454</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47333454</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47333454</guid></item><item><title><![CDATA[New comment by dzaima in "RISC-V Is Sloooow"]]></title><description><![CDATA[
<p>The option to generate or not generate misaligned loads/stores does exist (-mno-strict-align / -mstrict-align). But of course that's a compile-time option, and of course the preferred state would be to have use of them <i>on</i> by default, but RVA23 doesn't sufficiently guarantee/encourage them not being unreasonably-slow, leaving native misaligned loads/stores still effectively-unusable (and off by default on clang/gcc on -march=rva23u64).<p>aka, Zicclsm / RVA23 are entirely-useless as far as actually getting to make use of native misaligned loads/stores goes.</p>
]]></description><pubDate>Wed, 11 Mar 2026 08:57:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47333202</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47333202</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47333202</guid></item><item><title><![CDATA[New comment by dzaima in "Decimal-Java is a library to convert java.math.BigDecimal to and from IEEE-754r"]]></title><description><![CDATA[
<p>The type seems to just be a small wrapper around a BigDecimal; the actual conversion arithmetic will presumably be relatively extremely slow regardless, a single extra allocation (in addition to BigDecimal's ≥3) won't change much.</p>
]]></description><pubDate>Tue, 24 Feb 2026 17:17:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47139710</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47139710</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47139710</guid></item></channel></rss>