<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: hairtuq</title><link>https://news.ycombinator.com/user?id=hairtuq</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 22 Apr 2026 08:47:33 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=hairtuq" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by hairtuq in "When Compiler Optimizations Hurt Performance"]]></title><description><![CDATA[
<p>The mapping after we have the leading 1's count can be done in 3 instructions (in 32-bit math) on either x86 or ARM:<p><pre><code>    t = 0x2020c6 >> x
    return (t << (t & 31)) >> 27</code></pre></p>
]]></description><pubDate>Tue, 21 Oct 2025 19:15:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=45660317</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=45660317</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45660317</guid></item><item><title><![CDATA[New comment by hairtuq in "How we made JSON.stringify more than twice as fast"]]></title><description><![CDATA[
<p>If you want to optimize it a little more, you can combine isLess(s, 0x20) and isChar('"') into isLess(s ^ (kOnes * 0x02), 0x21) (this works since '"' is 0x22).</p>
]]></description><pubDate>Wed, 06 Aug 2025 21:19:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=44817987</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=44817987</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44817987</guid></item><item><title><![CDATA[New comment by hairtuq in "A leap year check in three instructions"]]></title><description><![CDATA[
<p>Thanks, fixed.</p>
]]></description><pubDate>Fri, 16 May 2025 05:37:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=44002144</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=44002144</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44002144</guid></item><item><title><![CDATA[New comment by hairtuq in "Optimizing with Novel Calendrical Algorithms"]]></title><description><![CDATA[
<p>The article leaves out optimizing is_leap_year. Here is an interesting version that is correct (in the Proleptic Gregorian calendar) for 0 <= year <= 102499 and takes only 3 instructions:<p><pre><code>    ((year * 1073750999) & 3221352463) <= 126976
</code></pre>
How this works is surprisingly complex, I am thinking of writing a whole blog post about it...</p>
]]></description><pubDate>Mon, 03 Feb 2025 22:51:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=42924485</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=42924485</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42924485</guid></item><item><title><![CDATA[New comment by hairtuq in "Optimizing uint64_t Digit Counting: A Method that Beats Lemire's by up to 143%"]]></title><description><![CDATA[
<p>Here is a cute variant that doesn't need lzcnt nor tables, but only works up to 99,999:<p><pre><code>    (((x + 393206) & (x + 524188)) ^ ((x + 916504) & (x + 514288))) >> 17
</code></pre>
This is for integer log 10, but could be adapted for number of digits. It needs a wrapper for 64 bit to invoke it multiple times, but most numbers in a JSON are small, so it might even be competitive; it needs only 4 cycles with enough instruction level parallelism.<p>I gathered this idea from the output of a superoptimizer, it was fun to figure out how it works. For spoilers, see [1].<p>[1] <a href="https://github.com/rust-lang/rust/blob/master/library/core/src/num/int_log10.rs">https://github.com/rust-lang/rust/blob/master/library/core/s...</a></p>
]]></description><pubDate>Wed, 08 Jan 2025 06:56:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=42631738</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=42631738</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42631738</guid></item><item><title><![CDATA[New comment by hairtuq in "Bitbanging 1D Reversible Automata"]]></title><description><![CDATA[
<p>The author wonders:<p>> In theory at least, the compiler can see that rule only has 256 values and create a reduced version of ca1d_rule_apply for each value. Whether it actually does is not of much practical concern when the rendering code is the bottle neck. However it’s interesting to see if the compiler can deduce the best solution or whether anything trips it up.<p>The compiler is unlikely to get the optimal result here. The core of this is finding the best instruction sequence for a ternary boolean operation encoded in 8 bits; it's the same job needed for emulating the AVX512F "vpternlog" instruction. This can always be done in at most 5 instructions (or 4 if you have andnot/ornot/xornot), but it's not straightforward to do this. Here is some code that calculates optimal instruction sequences (by letting z3 do the heavy lifting): <a href="https://github.com/falk-hueffner/ternary-logic-optimization">https://github.com/falk-hueffner/ternary-logic-optimization</a></p>
]]></description><pubDate>Thu, 12 Dec 2024 23:58:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=42404680</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=42404680</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42404680</guid></item><item><title><![CDATA[New comment by hairtuq in "Decoding UTF8 with parallel extract"]]></title><description><![CDATA[
<p>Similarly, the eexpect table can be done in 3 instructions with with t = 0x3c783023f + i; return ((t >> (t & 63)) & 0xf0808088.</p>
]]></description><pubDate>Mon, 06 May 2024 18:17:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=40277760</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=40277760</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40277760</guid></item><item><title><![CDATA[New comment by hairtuq in "Show HN: Kuboble.com – Minimalistic sliding pieces puzzle game"]]></title><description><![CDATA[
<p>I wrote a solver for Atomix (<a href="https://github.com/falk-hueffner/atomixer">https://github.com/falk-hueffner/atomixer</a>), a similar game. Technically, the only difference is that in Atomix the solution location is not predefined, although probably based on level design this game actually feels quite different. Since the solver just tries all solution location anyway, it would be quite simple to adapt it to this game.</p>
]]></description><pubDate>Wed, 08 Feb 2023 18:46:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=34712757</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=34712757</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34712757</guid></item><item><title><![CDATA[New comment by hairtuq in "A neat XOR trick"]]></title><description><![CDATA[
<p>On a related note, almost any task around shuffling or shifting bits can be improved with xor. For example the straightforward way to remove bit i is<p><pre><code>    mask = -1 << i;
    return (x & ~mask) | ((x >> 1) & mask);
</code></pre>
but with xor we can do it in one less instruction:<p><pre><code>    mask = -1 << i;
    return ((x ^ (x >> 1)) & mask) ^ x;</code></pre></p>
]]></description><pubDate>Tue, 13 Dec 2022 18:48:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=33973795</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=33973795</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33973795</guid></item><item><title><![CDATA[New comment by hairtuq in "After Xinjiang Revelations, Germany's Ties to China Are Under the Microscope"]]></title><description><![CDATA[
<p>It seems you're shifting the goal posts now. The fact that consensus criticisms are not as harsh as you'd like doesn't mean that Israel is "totally off limits". Discussions and criticisms are happening; and the fact that public opinion doesn't match yours can not only be explained by that opinion being suppressed, but also by it being wrong.</p>
]]></description><pubDate>Wed, 01 Jun 2022 19:19:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=31586688</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=31586688</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=31586688</guid></item><item><title><![CDATA[New comment by hairtuq in "After Xinjiang Revelations, Germany's Ties to China Are Under the Microscope"]]></title><description><![CDATA[
<p>I don't see that. For example, it has been the official position of the German government for at least 20 years to strongly criticize the Israeli settlements. There's regular critical reports in the press, endless TV discussions and so on. Can you name someone who has actually had to suffer negative consequences for reasonable criticism of Israel?</p>
]]></description><pubDate>Tue, 31 May 2022 13:10:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=31569062</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=31569062</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=31569062</guid></item><item><title><![CDATA[New comment by hairtuq in "Show HN: PyIng – Ingredient parser"]]></title><description><![CDATA[
<p>I wrote an ingredient parser that is basically just one giant regular expression: 
<a href="https://github.com/falk-hueffner/metric-cooking/blob/master/metric-cooking.js" rel="nofollow">https://github.com/falk-hueffner/metric-cooking/blob/master/...</a><p>The use case is a bit different (the task includes finding ingredients mentioned in a longer text, and it'll rather not parse something rather than parsing it wrong), but it works fairly OK and can even parse things like "3/4 cup plus 2 tablespoons packed light-brown sugar".</p>
]]></description><pubDate>Tue, 29 Mar 2022 10:38:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=30840882</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=30840882</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=30840882</guid></item><item><title><![CDATA[New comment by hairtuq in "Solving NP-hard puzzles with the oldest trick in the book"]]></title><description><![CDATA[
<p>For the quite similar puzzle Atomix, it also seems like the branching factor would be much higher for backward search because upper bounds are weaker, but you can show that on average the branching factor is actually the same [1]. I wonder if the same argument would work here.<p>[1] <a href="http://hueffner.de/falk/hueffner-studienarbeit-atomix.pdf" rel="nofollow">http://hueffner.de/falk/hueffner-studienarbeit-atomix.pdf</a> Section 5.5</p>
]]></description><pubDate>Thu, 14 Oct 2021 15:03:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=28865247</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=28865247</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=28865247</guid></item><item><title><![CDATA[New comment by hairtuq in "Impact of Undefined Behavior on Performance"]]></title><description><![CDATA[
<p>You don't necessarily need a wider type (which might be slower to work with), you can just calculate (n|1) * ((n+1)/2). Clang does something much more complicated, probably because for it is just a special case of some much more general optimization.</p>
]]></description><pubDate>Fri, 09 Jul 2021 18:12:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=27786686</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=27786686</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27786686</guid></item><item><title><![CDATA[New comment by hairtuq in "Counting the number of matching characters in two ASCII strings"]]></title><description><![CDATA[
<p>Slightly simpler:<p><pre><code>    c = x ^ y;
    return popcount(((c >> 1) | 0x8080808080808080) - c) & 0x8080808080808080);</code></pre></p>
]]></description><pubDate>Sat, 22 May 2021 08:30:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=27244670</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=27244670</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27244670</guid></item><item><title><![CDATA[New comment by hairtuq in "Counting the number of matching characters in two ASCII strings"]]></title><description><![CDATA[
<p>Actually, e is the inverse of the xor, so (e & c1) is guaranteed to be c1, and you still need popcount(c1 & ((e & c2) + c3)).</p>
]]></description><pubDate>Sat, 22 May 2021 07:42:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=27244433</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=27244433</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27244433</guid></item><item><title><![CDATA[New comment by hairtuq in "Bit Twiddling Hacks"]]></title><description><![CDATA[
<p>The name SWAR is sometimes used (<a href="https://en.wikipedia.org/wiki/SWAR" rel="nofollow">https://en.wikipedia.org/wiki/SWAR</a>).<p>Indeed, even 8-bit fields can be added in parallel, using the fact that ^ is like a + that does not produce a carry:<p><pre><code>    uint64_t signmask = 0x8080808080808080;
    uint64_t sum_without_sign_bits = ((x & ~signmask) + (y & ~signmask));
    uint64_t sum_of_sign_bits = (x ^ y) & signmask;
    return sum_without_sign_bits ^ sum_of_sign_bits;</code></pre></p>
]]></description><pubDate>Fri, 04 Dec 2020 10:58:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=25301039</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=25301039</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=25301039</guid></item><item><title><![CDATA[New comment by hairtuq in "Programming with a Read-Eval-Synth Loop [pdf]"]]></title><description><![CDATA[
<p>There is some work on automatically generating data structure implementations from high-level specifications: <a href="https://cozy.uwplse.org/" rel="nofollow">https://cozy.uwplse.org/</a></p>
]]></description><pubDate>Tue, 20 Oct 2020 11:53:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=24836286</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=24836286</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=24836286</guid></item><item><title><![CDATA[New comment by hairtuq in "Big O, Little N"]]></title><description><![CDATA[
<p>Since the shift count is implicitly modulo 32, you don't even need the subtraction:<p><pre><code>    bool isvowel(char c) {
        return (1u << (c&31)) & 0x208222;
    }</code></pre></p>
]]></description><pubDate>Sat, 03 Oct 2020 07:18:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=24670253</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=24670253</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=24670253</guid></item><item><title><![CDATA[New comment by hairtuq in "Turn recipe websites into plain text"]]></title><description><![CDATA[
<p>I wrote a browser extension that does this (although it will of course occasionally get something wrong): <a href="https://github.com/falk-hueffner/metric-cooking" rel="nofollow">https://github.com/falk-hueffner/metric-cooking</a></p>
]]></description><pubDate>Fri, 26 Jun 2020 08:12:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=23649814</link><dc:creator>hairtuq</dc:creator><comments>https://news.ycombinator.com/item?id=23649814</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=23649814</guid></item></channel></rss>