<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: moonchild</title><link>https://news.ycombinator.com/user?id=moonchild</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 15 Apr 2026 03:38:34 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=moonchild" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by moonchild in "SIMD within a register: How I doubled hash table lookup performance"]]></title><description><![CDATA[
<p>> Later version allowed to scan from arbitrary position by mirroring first bucket as last<p>you may find my improved table design of interest, which avoids the need for mirroring: <a href="https://outerproduct.net/trivial/2022-10-06_hash.html" rel="nofollow">https://outerproduct.net/trivial/2022-10-06_hash.html</a></p>
]]></description><pubDate>Tue, 29 Jul 2025 21:32:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=44728493</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=44728493</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44728493</guid></item><item><title><![CDATA[New comment by moonchild in "The $5000 Compression Challenge (2001)"]]></title><description><![CDATA[
<p>suppose the data were generated by a csprng with 256 bits of state.  then in fact there are just slightly more than 256 bits of entropy there (prng state plus the length of the generated file).  only, you'd have a snail's chance in hell of actually finding that seed</p>
]]></description><pubDate>Sun, 24 Nov 2024 00:20:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=42224986</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=42224986</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42224986</guid></item><item><title><![CDATA[New comment by moonchild in "Abstract Interpretation in the Toy Optimizer"]]></title><description><![CDATA[
<p>> "the" instance<p>i suppose you can probably do anything with dependent types, but i'm not sure this is a useful perspective.  i commend you to read my comments on the red website <a href="https://lobste.rs/s/xkcrvn/" rel="nofollow">https://lobste.rs/s/xkcrvn/</a><p>(i do think it is a valid question whether abstract interpretation is a good idea)</p>
]]></description><pubDate>Sat, 27 Jul 2024 20:01:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=41088949</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=41088949</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41088949</guid></item><item><title><![CDATA[New comment by moonchild in "Anna’s Archive approaching 1 petabyte"]]></title><description><![CDATA[
<p>they are talking about treating ocr as lossy.  i wonder about making a <i>lossless</i> compression algorithm for text scans based on an ocr; in effect, use the ocr to predict which text will show up and how, and then encode the pixel-level differences on top of that</p>
]]></description><pubDate>Sat, 20 Jul 2024 08:25:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=41014996</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=41014996</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41014996</guid></item><item><title><![CDATA[New comment by moonchild in "Rounding Percentages"]]></title><description><![CDATA[
<p>> they should be integers<p>i am not cheating; you are cheating by trying to do integer arithmetic instead of float arithmetic.  in particular: 99 * progress is a (potentially big) integer; then the quotient, from my understanding of python, is equal to the mathematical quantity rn(99 * progress / total), which is <i>not</i> trivial to compute.  (although cpython does tend to do a particularly bad job of this sort of thing.)  (compare with c or with my version of the python, where it would be rn(rn(99 * progress) / rn(total)), rounding twice, which is very easy to compute.  i'm not saying the c semantics is <i>better</i>, mind.)  when you scale up by 10x, the numerator is <1/2 'ulp' away from the denominator, and so the quotient rounds up to 99 exactly; there is still double rounding (would have been triple rounding in c and my python), because we got rn(rn(99 * progress / total) + 0.5) where what we wanted was rn(99 * progress / total + 0.5) (which is mathematically always the correct result)<p>i agree it is not common to have so many steps.  but if i were providing a routine that was intended to be robust where others were not, i would try to be comprehensive.  and i would not try to do int math with floats unless i could show it to be robust (i have sketched such a stunt! <a href="https://gist.github.com/moon-chilled/60bd2ba687dc197d93a9d2251229d715" rel="nofollow">https://gist.github.com/moon-chilled/60bd2ba687dc197d93a9d22...</a>).  the integer routine is simpler and more honest anyway, and it is obvious that it works uniformly for the entire range<p>note also that, with x floating, the python expressions 10 * x and 10 * x - 1 are equivalent, meaning the error is on <i>input</i> to the percent function.  (if we set progress to 10 * x - 8, the immediately preceding fp number, we do get 99, but there is no deep reason for this, and it differs for different values of 10.)<p>> if your username is named after the band, good taste :)<p>my username comes from a book: the neverending story.  i am more using the german version of the name these days but i do not feel like making a new account on this godforesaken website</p>
]]></description><pubDate>Sat, 13 Jul 2024 11:02:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=40953319</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40953319</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40953319</guid></item><item><title><![CDATA[New comment by moonchild in "Beating the L1 cache with value speculation (2021)"]]></title><description><![CDATA[
<p>depends how you define causality.  if you consider the execution of one operation to cause the execution of the next operation in program order, then causality was already broken by simple reordering.  if it's a read-write dependency, on the other hand, then it won't be broken (because cpus respect control dependencies); hence, you cannot, for example, replicate oota this way.  what's broken is specifically read-read data dependencies.  and only on weakly-ordered architectures; it won't do anything on x86</p>
]]></description><pubDate>Sat, 13 Jul 2024 01:36:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=40951072</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40951072</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40951072</guid></item><item><title><![CDATA[New comment by moonchild in "Co-Dfns v5.7.0"]]></title><description><![CDATA[
<p>> I've never heard anything about these sorts of graph algorithms being possible with good asymptotics in an array style<p>yeah—idk graph algos really, but have heard parallelising combinatorial search in general (eg sat) is hard because forks and joins happen heterogeneously and erratically.  this 2001 vintage has a bunch of completely sequential graph colouring algorithms in j <a href="https://dl.acm.org/doi/pdf/10.1145/570406.570416" rel="nofollow">https://dl.acm.org/doi/pdf/10.1145/570406.570416</a> (and j at least has sparse matrices!)<p>> Constant lifting within the compiler is pretty cool, I'll have to look into that.<p>hrm, it seems to refer to 2 things: 1) constants allocated in a special space; 2) interning.  2 is obviously worthwhile; 1 i would guess is related to memory being weird on the gpu?</p>
]]></description><pubDate>Wed, 10 Jul 2024 21:53:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=40931904</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40931904</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40931904</guid></item><item><title><![CDATA[New comment by moonchild in "State of Text Rendering 2024"]]></title><description><![CDATA[
<p>what exactly do you mean by 'global parse'?  it's very usual, i think, when operating on data stored in files, to parse them into in-memory structures before operating on them?  but it feels like you are talking about something specific to vector rendering<p>slug builds acceleration structures ahead of time.  the structures are overfit to the algorithm in a way that ttf should be but which is economical for video games.  that doesn't seem like an interesting concern and nothing about it is specific to the gpu</p>
]]></description><pubDate>Tue, 09 Jul 2024 08:17:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=40913675</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40913675</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40913675</guid></item><item><title><![CDATA[New comment by moonchild in "State of Text Rendering 2024"]]></title><description><![CDATA[
<p>lengyel told me he has implemented some sort of hinting on the gpu for slug (i suspect it's not programmable, but didn't ask)</p>
]]></description><pubDate>Tue, 09 Jul 2024 08:15:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=40913669</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40913669</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40913669</guid></item><item><title><![CDATA[New comment by moonchild in "State of Text Rendering 2024"]]></title><description><![CDATA[
<p>programmable hinting was already a thing.  it's just switching to wasm from a bespoke language</p>
]]></description><pubDate>Tue, 09 Jul 2024 03:59:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=40912422</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40912422</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40912422</guid></item><item><title><![CDATA[New comment by moonchild in "Properly testing concurrent data structures"]]></title><description><![CDATA[
<p>> 1/t^p<p>i don't think that's right.  it's just 1/t.  after all, after t time, one task must have made progress; since there are t tasks, the probability that i'm the task that made progress is just 1/t<p>the primary point of confusion, i think, is that getting interrupted does not mean that you are necessarily going to lose the cas</p>
]]></description><pubDate>Sat, 06 Jul 2024 19:57:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=40892799</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40892799</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40892799</guid></item><item><title><![CDATA[New comment by moonchild in "Spending too much time optimizing for loops"]]></title><description><![CDATA[
<p>i saw this last week and was incredibly confused.  aside from being naive it's totally overfit where general approaches are very well known??</p>
]]></description><pubDate>Tue, 02 Jul 2024 07:23:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=40854306</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40854306</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40854306</guid></item><item><title><![CDATA[New comment by moonchild in "Rounding Percentages"]]></title><description><![CDATA[
<p>> float<p>hm ...<p><pre><code>  >>> def percent(progress, total):
  ...     return round(99 * progress / total + 0.5)
  ...
  >>> x = 9007199254740990.0
  >>> x - 1 < x
  True
  >>> percent(x - 1, x)
  100</code></pre></p>
]]></description><pubDate>Tue, 02 Jul 2024 05:52:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=40853875</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40853875</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40853875</guid></item><item><title><![CDATA[New comment by moonchild in "Tracing garbage collection for arenas"]]></title><description><![CDATA[
<p>> alignment granule<p>if you have metadata to identify object starts then you can do 1 bit per min object size (which can be bigger than the alignment granule)</p>
]]></description><pubDate>Thu, 27 Jun 2024 06:18:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=40807821</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40807821</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40807821</guid></item><item><title><![CDATA[New comment by moonchild in "A brief introduction to interval arithmetic"]]></title><description><![CDATA[
<p>even single-output is a dag; you can explode the dag into a tree, but then you pay in time.  suppose some expensive term x is used to compute n other terms.  after computing x, assuming we want to share it, we have to compute all n terms before killing x; hence x sticks around longer than it might.  (this doesn't <i>necessarily</i> lead to explosion, but i'm pretty sure you can use this to construct a scenario that needs a large number of live terms to avoid compromising time complexity; at least n live terms if the dag has nlogn nodes)</p>
]]></description><pubDate>Thu, 27 Jun 2024 01:24:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=40806389</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40806389</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40806389</guid></item><item><title><![CDATA[New comment by moonchild in "Tracing garbage collection for arenas"]]></title><description><![CDATA[
<p>greenspun's tenth rule:<p>> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.</p>
]]></description><pubDate>Wed, 26 Jun 2024 23:14:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=40805620</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40805620</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40805620</guid></item><item><title><![CDATA[New comment by moonchild in "Tracing garbage collection for arenas"]]></title><description><![CDATA[
<p>i would do a variable-size header 4 or 8 bytes for most small objects (8 if you want to provide alignment, 4 if not or if you can squish your pointers).  one bit identifies if the header is small or large; if small, next few bits give the size of the object in words, and the remainder are a bitmap telling which words are pointers.  forwarding pointer stomps on the first word of the object</p>
]]></description><pubDate>Wed, 26 Jun 2024 22:16:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=40805207</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40805207</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40805207</guid></item><item><title><![CDATA[New comment by moonchild in "A brief introduction to interval arithmetic"]]></title><description><![CDATA[
<p>(dynamic) computation forms a dag, not a tree.  i think a sum scan (n inputs -> n outputs) will trigger the worst case.  it might be that computations tend to be tree-shaped, so you rarely hit the worst case, but that helps everybody out, not just affine arithmetic</p>
]]></description><pubDate>Wed, 26 Jun 2024 22:00:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=40805082</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40805082</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40805082</guid></item><item><title><![CDATA[New comment by moonchild in "A brief introduction to interval arithmetic"]]></title><description><![CDATA[
<p>feel free to ping on irc if interested in chatting more on the topic, though i'm not sure i have a ton more original thoughts atm<p>if you have an expression with n terms, then you will end up with O(n) terms each taking up O(n) space, so the <i>overall</i> space usage is quadratic.  (that's the fair way to compare to polyhedra since they're always global)</p>
]]></description><pubDate>Wed, 26 Jun 2024 09:26:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=40798078</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40798078</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40798078</guid></item><item><title><![CDATA[New comment by moonchild in "A brief introduction to interval arithmetic"]]></title><description><![CDATA[
<p>so a middle ground (as many pseudo-symbolic approaches).  glanced at wikipedia—this seems not dissimilar conceptually to the abstract domain of polyhedra, in that it's symbolic but has a flat structure and expresses only linear relationships.  of course, polyhedra are exponential where affine arithmetic is quadratic (overall space in number of terms), and polyhedra are global where affine is local (therefore easier to implement as a library, but no sharing).  more missed connections between numerical analysis and program analysis?  (meh but no sharing probably means affine generally loses for a given resource budget.  would still be interesting to try though)</p>
]]></description><pubDate>Wed, 26 Jun 2024 08:25:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=40797620</link><dc:creator>moonchild</dc:creator><comments>https://news.ycombinator.com/item?id=40797620</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40797620</guid></item></channel></rss>