<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: nicula</title><link>https://news.ycombinator.com/user?id=nicula</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 18 Apr 2026 07:17:09 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=nicula" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[Claude Code's poor time awareness]]></title><description><![CDATA[
<p>Article URL: <a href="https://nicula.xyz/2026/03/18/time-and-llms.html">https://nicula.xyz/2026/03/18/time-and-llms.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47432270">https://news.ycombinator.com/item?id=47432270</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 18 Mar 2026 22:37:11 +0000</pubDate><link>https://nicula.xyz/2026/03/18/time-and-llms.html</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=47432270</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47432270</guid></item><item><title><![CDATA[Auto-vectorizing operations on buffers of unknown length]]></title><description><![CDATA[
<p>Article URL: <a href="https://nicula.xyz/2025/11/15/vectorizing-unknown-length-loops.html">https://nicula.xyz/2025/11/15/vectorizing-unknown-length-loops.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45938295">https://news.ycombinator.com/item?id=45938295</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 15 Nov 2025 16:02:25 +0000</pubDate><link>https://nicula.xyz/2025/11/15/vectorizing-unknown-length-loops.html</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=45938295</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45938295</guid></item><item><title><![CDATA[New comment by nicula in "Bypassing the Branch Predictor"]]></title><description><![CDATA[
<p>Yeah it's quite interesting. I don't think I've seen this come up in anything besides that talk which is about the HFT space.<p>However, I did hear a few times about cases where you'd trade better average performance that comes with spikes (e.g. in terms of response time) in favor of worse average performance that has lower variance.</p>
]]></description><pubDate>Tue, 11 Mar 2025 00:06:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=43327694</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43327694</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43327694</guid></item><item><title><![CDATA[Bypassing the Branch Predictor]]></title><description><![CDATA[
<p>Article URL: <a href="https://nicula.xyz/2025/03/10/bypassing-the-branch-predictor.html">https://nicula.xyz/2025/03/10/bypassing-the-branch-predictor.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43326043">https://news.ycombinator.com/item?id=43326043</a></p>
<p>Points: 9</p>
<p># Comments: 2</p>
]]></description><pubDate>Mon, 10 Mar 2025 21:01:50 +0000</pubDate><link>https://nicula.xyz/2025/03/10/bypassing-the-branch-predictor.html</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43326043</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43326043</guid></item><item><title><![CDATA[New comment by nicula in "Improving on std:count_if()'s auto-vectorization"]]></title><description><![CDATA[
<p>Great observations, thanks!<p>I wrote the code that you suggested (LMK if I understood your points): <a href="https://godbolt.org/z/jW4o3cnh3" rel="nofollow">https://godbolt.org/z/jW4o3cnh3</a><p>And here's the benchmark output, on my machine: <a href="https://0x0.st/8SsG.txt" rel="nofollow">https://0x0.st/8SsG.txt</a> (v1 is std::count_if(), v2 is the optimization from my blog post, and v3 is what you suggested).<p>v2 is faster, but v3 is still quite fast.</p>
]]></description><pubDate>Sun, 09 Mar 2025 22:34:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=43314621</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43314621</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43314621</guid></item><item><title><![CDATA[New comment by nicula in "Improving on std:count_if()'s auto-vectorization"]]></title><description><![CDATA[
<p>Thanks for looking into it!<p>I modified the footnote to get rid of the misleading statements regarding the 'backfiring' of the optimization. :)</p>
]]></description><pubDate>Sun, 09 Mar 2025 21:44:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=43314216</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43314216</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43314216</guid></item><item><title><![CDATA[New comment by nicula in "Improving on std:count_if()'s auto-vectorization"]]></title><description><![CDATA[
<p>> I'm not sure how "it can go the other way around too" -- in that case (assigning to a uint8_t local variable), it seems like that particular optimisation is just not being applied.<p>So the case that you described has 2 layers. The internal std::count_if() layer, which has a 64-bit counter, and the 'return' layer of the count_even_values_v1() function, which has an 8-bit type. In this case, Clang propagates the 8-bit type from the 'return' layer all the way to the inner std::count_if() layer, which effectively means that you're requesting an 8-bit counter, and thus Clang generates the efficient vectorization.<p>However, say that you have the following 3 layers: (1) internal std::count_if() layer with a 64-bit counter; (2) local 8-bit variable layer, to which the std::count_if() result gets assigned; (3) 'return' layer with a 64-bit type. In this case the 64-bit type from layer 3 gets propagated to the inner std::count_if() layer, which will lead to a poor vectorization. Demo: <a href="https://godbolt.org/z/Eo13WKrK4" rel="nofollow">https://godbolt.org/z/Eo13WKrK4</a> . So this downwards type-propagation from the outmost layer into the innermost layer doesn't guarantee optimality. In this case, the optimal propagation would've been from layer 2 down to layer 1 and up to layer 3.<p>Note: I'm not familiar with how the LLVM optimization pass does this exactly, so take this with a huge grain of salt. Perhaps it does indeed 'propagate' the outmost type to the innermost layer. Or perhaps the mere fact that there are more than 2 layers makes the optimization pass not happen at all. Either way, the end result is that the vectorization is poor.</p>
]]></description><pubDate>Sun, 09 Mar 2025 19:39:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=43312897</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43312897</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43312897</guid></item><item><title><![CDATA[New comment by nicula in "Improving on std:count_if()'s auto-vectorization"]]></title><description><![CDATA[
<p>Some people already mentioned this in the r/cpp discussion. Small correction: 256 is not the correct number of iterations, since if all elements in that slice are even, then your 8-bit counter will wrap-around to zero, which can lead to a wrong answer. What you want is 255 iterations.<p>I've looked at the generated assembly for such a solution and it doesn't look great. I'm expecting a significant speed penalty, but I haven't had the time to test it today. Will probably do so tomorrow.</p>
]]></description><pubDate>Sun, 09 Mar 2025 18:10:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=43311933</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43311933</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43311933</guid></item><item><title><![CDATA[New comment by nicula in "Improving on std:count_if()'s auto-vectorization"]]></title><description><![CDATA[
<p>Like @wffurr mentioned, this is indeed discussed in a footnote. I just added another remark to the same footnote:<p>"It's also debatable whether or not Clang's 'optimization' results in better codegen in most cases that you care about. The same optimization pass can backfire pretty easily, because it can go the other way around too. For example, if you assigned the `std::count_if()` result to a local `uint8_t` value, but then returned that value as a `uint64_t` from the function, then Clang will assume that you wanted a `uint64_t` accumulator all along, and thus generates the poor vectorization, not the efficient one."</p>
]]></description><pubDate>Sun, 09 Mar 2025 18:06:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=43311897</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43311897</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43311897</guid></item><item><title><![CDATA[New comment by nicula in "Improving on std:count_if()'s auto-vectorization"]]></title><description><![CDATA[
<p>haha I didn't even notice that</p>
]]></description><pubDate>Sun, 09 Mar 2025 17:48:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=43311675</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43311675</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43311675</guid></item><item><title><![CDATA[Improving on std:count_if()'s auto-vectorization]]></title><description><![CDATA[
<p>Article URL: <a href="https://nicula.xyz/2025/03/08/improving-stdcountif-vectorization.html">https://nicula.xyz/2025/03/08/improving-stdcountif-vectorization.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43302394">https://news.ycombinator.com/item?id=43302394</a></p>
<p>Points: 134</p>
<p># Comments: 45</p>
]]></description><pubDate>Sat, 08 Mar 2025 18:44:19 +0000</pubDate><link>https://nicula.xyz/2025/03/08/improving-stdcountif-vectorization.html</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43302394</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43302394</guid></item><item><title><![CDATA[New comment by nicula in "A Clang regression related to switch statements and inlining"]]></title><description><![CDATA[
<p>> So it's not a Clang regression per se, it's an issue with the LLVM core?<p>Yes.<p>> If you run LLVM 18's `opt` on bytecode generated by Clang 19 and then compile it, does it also generate the same bad assembly?<p>No. If you pass the LLVM IR bitcode generated by Clang 18 to Clang 19, then the assembly is good.<p>I called it a 'Clang regression' in the sense that the way in which I discovered and tested this difference in performance was via Clang. So from a typical user's perspective (who doesn't care about the inner workings and distinct components of Clang), this is a 'Clang regression'.</p>
]]></description><pubDate>Wed, 26 Feb 2025 18:15:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=43186355</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43186355</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43186355</guid></item><item><title><![CDATA[New comment by nicula in "A Clang regression related to switch statements and inlining"]]></title><description><![CDATA[
<p>This issue doesn't require <i>large</i> switch tables in order to show up. Even if you have 4 cases and the rest of them are default'ed, Clang 18 optimizes that to a switch, while Clang 19 does the (potentially) inefficient labels+jumps approach: <a href="https://godbolt.org/z/Y6njP8j38" rel="nofollow">https://godbolt.org/z/Y6njP8j38</a><p>This whole investigation started because I was writing some Rust code with a couple of small `match`es, and for some reason they weren't being optimized to a lookup table. I wrote a more minimal reproduction of that issue in C++ and eventually found the Clang regression. Since Rust also uses LLVM, `match`es suffer from the same regression (depending on which Rust version you're using).</p>
]]></description><pubDate>Sun, 23 Feb 2025 13:11:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=43148998</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43148998</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43148998</guid></item><item><title><![CDATA[Getting rid of unwanted branches in C/C++ code with __builtin_unreachable()]]></title><description><![CDATA[
<p>Article URL: <a href="https://nicula.xyz/2025/02/23/unwanted-branches.html">https://nicula.xyz/2025/02/23/unwanted-branches.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43148694">https://news.ycombinator.com/item?id=43148694</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Sun, 23 Feb 2025 12:03:29 +0000</pubDate><link>https://nicula.xyz/2025/02/23/unwanted-branches.html</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43148694</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43148694</guid></item><item><title><![CDATA[New comment by nicula in "A Clang regression related to switch statements and inlining"]]></title><description><![CDATA[
<p>Oh wow didn't know about this. Somebody else posted it for me. Sorry for the dup:)</p>
]]></description><pubDate>Sun, 23 Feb 2025 12:01:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=43148680</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43148680</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43148680</guid></item><item><title><![CDATA[A Clang regression related to switch statements and inlining]]></title><description><![CDATA[
<p>Article URL: <a href="https://nicula.xyz/2025/02/16/clang-and-big-switches.html">https://nicula.xyz/2025/02/16/clang-and-big-switches.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43107244">https://news.ycombinator.com/item?id=43107244</a></p>
<p>Points: 21</p>
<p># Comments: 3</p>
]]></description><pubDate>Wed, 19 Feb 2025 20:31:34 +0000</pubDate><link>https://nicula.xyz/2025/02/16/clang-and-big-switches.html</link><dc:creator>nicula</dc:creator><comments>https://news.ycombinator.com/item?id=43107244</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43107244</guid></item></channel></rss>