<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: stephencanon</title><link>https://news.ycombinator.com/user?id=stephencanon</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 13 Jun 2026 06:30:28 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=stephencanon" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by stephencanon in "Swift at Apple: Migrating the TrueType hinting interpreter"]]></title><description><![CDATA[
<p>The work discussed in this post shipped in the OS last year (fall 2025), so nothing here is dependent on very recent changes.</p>
]]></description><pubDate>Sat, 13 Jun 2026 02:05:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48511811</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=48511811</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48511811</guid></item><item><title><![CDATA[New comment by stephencanon in "Why isn't the U.S. better at soccer?"]]></title><description><![CDATA[
<p>Field hockey is almost exclusively a girls sport in the US, while boys have (American) football in the fall. Both draw from the potential pool of soccer players in US middle and high schools.</p>
]]></description><pubDate>Mon, 08 Jun 2026 00:01:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=48439916</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=48439916</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48439916</guid></item><item><title><![CDATA[New comment by stephencanon in "Superintelligence: The Idea That Eats Smart People (2016)"]]></title><description><![CDATA[
<p>“in a real race we have no actual chance of winning” is an absolutely wild thing to say in response to a link to a real race in which the human has won the last few years in a row.</p>
]]></description><pubDate>Mon, 01 Jun 2026 19:40:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=48361605</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=48361605</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48361605</guid></item><item><title><![CDATA[New comment by stephencanon in "Tesla is recalling its cheaper Cybertruck because the wheels might fall off"]]></title><description><![CDATA[
<p>What sort of engineering standards are these Cybertrucks built to?<p>Oh, very rigorous engineering standards. The wheels aren't supposed to fall off for a start.</p>
]]></description><pubDate>Fri, 08 May 2026 14:28:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48063754</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=48063754</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48063754</guid></item><item><title><![CDATA[New comment by stephencanon in "Heat pump sales rise across Europe"]]></title><description><![CDATA[
<p>We have a ground-source heat pump for our ADU. We did it because we were curious about just how efficient we could make the house, but I don't expect that it will ever break even financially vs a modern air-source system with resistive backup in our climate (northern New England, typically very few –20˚ nights, –10˚-0˚ more common with daytime highs in the single digits).<p>It works great, but it's hard to see a way to it making sense for most folks here.</p>
]]></description><pubDate>Mon, 04 May 2026 19:50:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48014034</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=48014034</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48014034</guid></item><item><title><![CDATA[New comment by stephencanon in "You can beat the binary search"]]></title><description><![CDATA[
<p>Prior to the current generation Intel designs, Apple’s branch predictor tables were a good deal larger than Intel’s IIRC, so depending on benchmarking details it’s plausible that Apple Silicon was predicting every branch perfectly in the benchmark, while Intel had a more real-world mispredict rate. Perf counters would confirm.</p>
]]></description><pubDate>Thu, 30 Apr 2026 15:34:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=47964078</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47964078</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47964078</guid></item><item><title><![CDATA[New comment by stephencanon in "Improving ICU handovers by learning from Scuderia Ferrari F1 team"]]></title><description><![CDATA[
<p>Must be the water.</p>
]]></description><pubDate>Wed, 29 Apr 2026 16:57:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47951132</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47951132</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47951132</guid></item><item><title><![CDATA[New comment by stephencanon in "The GNU libc atanh is correctly rounded"]]></title><description><![CDATA[
<p>The CORE-MATH project authors, most of whom are French academics (including the author of the linked paper).<p>I don’t know of any interesting work in this space that came out of Red Hat, why do you suggest them?</p>
]]></description><pubDate>Sat, 18 Apr 2026 04:11:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47813074</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47813074</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47813074</guid></item><item><title><![CDATA[New comment by stephencanon in "The Oxford Comma – Why and Why Not (2024)"]]></title><description><![CDATA[
<p>"I'd like to thank my mother, Ayn Rand, and God" is the usual example.<p>Yes, you can reorder the list to remove the ambiguity, but sometimes the order of the list matters. The serial comma should be used when necessary to remove ambiguity, and not used when it introduces ambiguity. Rewrite the sentence when necessary. Worth noting that this is the Oxford University Press's own style rule!</p>
]]></description><pubDate>Thu, 26 Mar 2026 19:37:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47534707</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47534707</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47534707</guid></item><item><title><![CDATA[New comment by stephencanon in "How many branches can your CPU predict?"]]></title><description><![CDATA[
<p>That would fall under "more constrained", due to process limits.</p>
]]></description><pubDate>Thu, 19 Mar 2026 21:22:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47446366</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47446366</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47446366</guid></item><item><title><![CDATA[New comment by stephencanon in "How many branches can your CPU predict?"]]></title><description><![CDATA[
<p>Enlarging a branch predictor requires area and timing tradeoffs. CPU designers have to balance branch predictor improvements against other improvements they could make with the same area and timing resources. What this tells you is that either Intel is more constrained for one reason or another, or Intel's designers think that they net larger wins by deploying those resources elsewhere in the CPU (which might be because they have identified larger opportunities for improvement, or because they are basing their decision making on a different sample of software, or both).</p>
]]></description><pubDate>Thu, 19 Mar 2026 14:10:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=47439789</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47439789</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47439789</guid></item><item><title><![CDATA[New comment by stephencanon in "What every computer scientist should know about floating-point arithmetic (1991) [pdf]"]]></title><description><![CDATA[
<p>Your students should be able to figure out if a computation is exact or not, because they should understand binary representation of numbers.</p>
]]></description><pubDate>Mon, 16 Mar 2026 15:49:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47400611</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47400611</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47400611</guid></item><item><title><![CDATA[New comment by stephencanon in "Faster asin() was hiding in plain sight"]]></title><description><![CDATA[
<p>You can often eke something out for order-four, depending on uArch details. But basically yeah.</p>
]]></description><pubDate>Thu, 12 Mar 2026 00:01:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47344339</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47344339</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47344339</guid></item><item><title><![CDATA[New comment by stephencanon in "Faster asin() was hiding in plain sight"]]></title><description><![CDATA[
<p>For throughput-dominated contexts, evaluation via Horner's rule does very well because it minimizes register pressure and the number of operations required. But the latency can be relatively high, as you note.<p>There are a few good general options to extract more ILP for latency-dominated contexts, though all of them trade additional register pressure and usually some additional operation count; Estrin's scheme is the most commonly used. Factoring medium-order polynomials into quadratics is sometimes a good option (not all such factorizations are well behaved wrt numerical stability, but it also can give you the ability to synthesize selected extra-precise coefficients naturally without doing head-tail arithmetic). Quadratic factorizations are a favorite of mine because (when they work) they yield good performance in _both_ latency- and throughput-dominated contexts, which makes it easier to deliver identical results for scalar and vectorized functions.<p>There's no general form "best" option for optimizing latency; when I wrote math library functions day-to-day we just built a table of the optimal evaluation sequence for each order of polynomial up to 8 or so and each microarchitecture and grabbed the one we needed unless there were special constraints that required a different choice.</p>
]]></description><pubDate>Wed, 11 Mar 2026 19:52:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=47340404</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47340404</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47340404</guid></item><item><title><![CDATA[New comment by stephencanon in "Faster asin() was hiding in plain sight"]]></title><description><![CDATA[
<p>When Intel specced the rsqrt[ps]s and rcp[ps]s instructions ~30 years ago, they didn't fully specify their behavior. They just said their relative error is "smaller than 1.5 * 2⁻¹²," which someone thought was very clever because it gave them leeway to use tables or piecewise linear approximations or digit-by-digit computation or whatever was best suited to future processors. Since these are not IEEE 754 correctly-rounded operations, and there was (by definition) no software that currently used them, this was "fine".<p>And mostly it has been OK, except for some cases like games or simulations that want to get bitwise identical results across HW, which (if they're lucky) just don't use these operations or (if they're unlucky) use them and have to handle mismatches somehow. Compilers never generate these operations implicitly unless you're compiling with some sort of fast-math flag, so you mostly only get to them by explicitly using an intrinsic, and in theory you know what you're signing up for if you do that.<p>However, this did make them unusable for some scenarios where you would otherwise like to use them, so a bunch of graphics and scientific computing and math library developers said "please fully specify these operations next time" and now NEON/SVE and AVX512 have fully-specified reciprocal estimates,¹ which solves the problem unless you have to interoperate between x86 and ARM.<p>¹ e.g. Intel "specifies" theirs here: <a href="https://www.intel.com/content/www/us/en/developer/articles/code-sample/reference-implementations-for-ia-approximation-instructions-vrcp14-vrsqrt14-vrcp28-vrsqrt28-vexp2.html" rel="nofollow">https://www.intel.com/content/www/us/en/developer/articles/c...</a><p>ARM's is a little more readable: <a href="https://developer.arm.com/documentation/ddi0596/2021-03/Shared-Pseudocode/Shared-Functions?lang=en#impl-shared.RecipSqrtEstimate.2" rel="nofollow">https://developer.arm.com/documentation/ddi0596/2021-03/Shar...</a></p>
]]></description><pubDate>Wed, 11 Mar 2026 18:02:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47338970</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47338970</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47338970</guid></item><item><title><![CDATA[New comment by stephencanon in "Faster asin() was hiding in plain sight"]]></title><description><![CDATA[
<p>For the asinf libcall on macOS/x86, my former colleague Eric Postpischil invented the novel (at least at the time, I believe) technique of using a Remez-optimized refinement polynomial following rsqrtss instead of the standard Newton-Raphson iteration coefficients, which allowed him to squeeze out just enough extra precision to make the function achieve sub-ulp accuracy. One of my favorite tricks.<p>We didn't carry that algorithm forward to arm64, sadly, because Apple's architects made fsqrt fast enough that it wasn't worth it in scalar contexts.</p>
]]></description><pubDate>Wed, 11 Mar 2026 16:19:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=47337582</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47337582</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47337582</guid></item><item><title><![CDATA[New comment by stephencanon in "Faster asin() was hiding in plain sight"]]></title><description><![CDATA[
<p>Ideally either one is just a library call to generate the coefficients. Remez can get into trouble near the endpoints of the interval for asin and require a little bit of manual intervention, however.</p>
]]></description><pubDate>Wed, 11 Mar 2026 16:12:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47337506</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47337506</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47337506</guid></item><item><title><![CDATA[New comment by stephencanon in "Faster asin() was hiding in plain sight"]]></title><description><![CDATA[
<p>Newer rsqrt approximations (ARM NEON and SVE, and the AVX512F approximations on x86) make the behavior architectural so this is somewhat less of a problem (it still varies between _architectures_, however).</p>
]]></description><pubDate>Wed, 11 Mar 2026 16:07:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47337427</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47337427</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47337427</guid></item><item><title><![CDATA[New comment by stephencanon in "Faster asin() was hiding in plain sight"]]></title><description><![CDATA[
<p>These sorts of approximations (and more sophisticated methods) are fairly widely used in systems programming, as seen by the fact that Apple's asin is only a couple percent slower and sub-ulp accurate (<a href="https://members.loria.fr/PZimmermann/papers/accuracy.pdf" rel="nofollow">https://members.loria.fr/PZimmermann/papers/accuracy.pdf</a>). I would expect to get similar performance on non-Apple x86 using Intel's math library, which does not seem to have been measured, and significantly better performance while preserving accuracy using a vectorized library call.<p>The approximation reported here is slightly faster but only accurate to about 2.7e11 ulp. That's totally appropriate for the graphics use in question, but no one would ever use it for a system library; less than half the bits are good.<p>Also worth noting that it's possible to go faster without further loss of accuracy--the approximation uses a correctly rounded square root, which is much more accurate than the rest of the approximation deserves. An approximate square root will deliver the same overall accuracy and much better vectorized performance.</p>
]]></description><pubDate>Wed, 11 Mar 2026 15:38:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47337022</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47337022</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47337022</guid></item><item><title><![CDATA[New comment by stephencanon in "Yann LeCun's AI startup raises $1B in Europe's largest ever seed round"]]></title><description><![CDATA[
<p>Yann is definitely more well-known outside of academia. Inside academia, it's going to depend a lot on your specific background and how old you are.</p>
]]></description><pubDate>Tue, 10 Mar 2026 13:08:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=47322787</link><dc:creator>stephencanon</dc:creator><comments>https://news.ycombinator.com/item?id=47322787</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47322787</guid></item></channel></rss>