Hacker News: moonchild

New comment by moonchild in "SIMD within a register: How I doubled hash table lookup performance"

moonchild — Tue, 29 Jul 2025 21:32:39 +0000

> Later version allowed to scan from arbitrary position by mirroring first bucket as last

you may find my improved table design of interest, which avoids the need for mirroring: https://outerproduct.net/trivial/2022-10-06_hash.html

New comment by moonchild in "The $5000 Compression Challenge (2001)"

moonchild — Sun, 24 Nov 2024 00:20:25 +0000

suppose the data were generated by a csprng with 256 bits of state. then in fact there are just slightly more than 256 bits of entropy there (prng state plus the length of the generated file). only, you'd have a snail's chance in hell of actually finding that seed

New comment by moonchild in "Abstract Interpretation in the Toy Optimizer"

moonchild — Sat, 27 Jul 2024 20:01:25 +0000

> "the" instance

i suppose you can probably do anything with dependent types, but i'm not sure this is a useful perspective. i commend you to read my comments on the red website https://lobste.rs/s/xkcrvn/

(i do think it is a valid question whether abstract interpretation is a good idea)

New comment by moonchild in "Anna’s Archive approaching 1 petabyte"

moonchild — Sat, 20 Jul 2024 08:25:47 +0000

they are talking about treating ocr as lossy. i wonder about making a lossless compression algorithm for text scans based on an ocr; in effect, use the ocr to predict which text will show up and how, and then encode the pixel-level differences on top of that

New comment by moonchild in "Rounding Percentages"

moonchild — Sat, 13 Jul 2024 11:02:57 +0000

> they should be integers

i am not cheating; you are cheating by trying to do integer arithmetic instead of float arithmetic. in particular: 99 * progress is a (potentially big) integer; then the quotient, from my understanding of python, is equal to the mathematical quantity rn(99 * progress / total), which is not trivial to compute. (although cpython does tend to do a particularly bad job of this sort of thing.) (compare with c or with my version of the python, where it would be rn(rn(99 * progress) / rn(total)), rounding twice, which is very easy to compute. i'm not saying the c semantics is better, mind.) when you scale up by 10x, the numerator is <1/2 'ulp' away from the denominator, and so the quotient rounds up to 99 exactly; there is still double rounding (would have been triple rounding in c and my python), because we got rn(rn(99 * progress / total) + 0.5) where what we wanted was rn(99 * progress / total + 0.5) (which is mathematically always the correct result)

i agree it is not common to have so many steps. but if i were providing a routine that was intended to be robust where others were not, i would try to be comprehensive. and i would not try to do int math with floats unless i could show it to be robust (i have sketched such a stunt! https://gist.github.com/moon-chilled/60bd2ba687dc197d93a9d22...). the integer routine is simpler and more honest anyway, and it is obvious that it works uniformly for the entire range

note also that, with x floating, the python expressions 10 * x and 10 * x - 1 are equivalent, meaning the error is on input to the percent function. (if we set progress to 10 * x - 8, the immediately preceding fp number, we do get 99, but there is no deep reason for this, and it differs for different values of 10.)

> if your username is named after the band, good taste :)

my username comes from a book: the neverending story. i am more using the german version of the name these days but i do not feel like making a new account on this godforesaken website

New comment by moonchild in "Beating the L1 cache with value speculation (2021)"

moonchild — Sat, 13 Jul 2024 01:36:07 +0000

depends how you define causality. if you consider the execution of one operation to cause the execution of the next operation in program order, then causality was already broken by simple reordering. if it's a read-write dependency, on the other hand, then it won't be broken (because cpus respect control dependencies); hence, you cannot, for example, replicate oota this way. what's broken is specifically read-read data dependencies. and only on weakly-ordered architectures; it won't do anything on x86

New comment by moonchild in "Co-Dfns v5.7.0"

moonchild — Wed, 10 Jul 2024 21:53:19 +0000

> I've never heard anything about these sorts of graph algorithms being possible with good asymptotics in an array style

yeah—idk graph algos really, but have heard parallelising combinatorial search in general (eg sat) is hard because forks and joins happen heterogeneously and erratically. this 2001 vintage has a bunch of completely sequential graph colouring algorithms in j https://dl.acm.org/doi/pdf/10.1145/570406.570416 (and j at least has sparse matrices!)

> Constant lifting within the compiler is pretty cool, I'll have to look into that.

hrm, it seems to refer to 2 things: 1) constants allocated in a special space; 2) interning. 2 is obviously worthwhile; 1 i would guess is related to memory being weird on the gpu?

New comment by moonchild in "State of Text Rendering 2024"

moonchild — Tue, 09 Jul 2024 08:17:09 +0000

what exactly do you mean by 'global parse'? it's very usual, i think, when operating on data stored in files, to parse them into in-memory structures before operating on them? but it feels like you are talking about something specific to vector rendering

slug builds acceleration structures ahead of time. the structures are overfit to the algorithm in a way that ttf should be but which is economical for video games. that doesn't seem like an interesting concern and nothing about it is specific to the gpu

New comment by moonchild in "State of Text Rendering 2024"

moonchild — Tue, 09 Jul 2024 08:15:45 +0000

lengyel told me he has implemented some sort of hinting on the gpu for slug (i suspect it's not programmable, but didn't ask)

New comment by moonchild in "State of Text Rendering 2024"

moonchild — Tue, 09 Jul 2024 03:59:13 +0000

programmable hinting was already a thing. it's just switching to wasm from a bespoke language

New comment by moonchild in "Properly testing concurrent data structures"

moonchild — Sat, 06 Jul 2024 19:57:51 +0000

> 1/t^p

i don't think that's right. it's just 1/t. after all, after t time, one task must have made progress; since there are t tasks, the probability that i'm the task that made progress is just 1/t

the primary point of confusion, i think, is that getting interrupted does not mean that you are necessarily going to lose the cas

New comment by moonchild in "Spending too much time optimizing for loops"

moonchild — Tue, 02 Jul 2024 07:23:14 +0000

i saw this last week and was incredibly confused. aside from being naive it's totally overfit where general approaches are very well known??

New comment by moonchild in "Rounding Percentages"

moonchild — Tue, 02 Jul 2024 05:52:14 +0000

> float

hm ...

  >>> def percent(progress, total):
  ...     return round(99 * progress / total + 0.5)
  ...
  >>> x = 9007199254740990.0
  >>> x - 1 < x
  True
  >>> percent(x - 1, x)
  100

New comment by moonchild in "Tracing garbage collection for arenas"

moonchild — Thu, 27 Jun 2024 06:18:48 +0000

> alignment granule

if you have metadata to identify object starts then you can do 1 bit per min object size (which can be bigger than the alignment granule)

New comment by moonchild in "A brief introduction to interval arithmetic"

moonchild — Thu, 27 Jun 2024 01:24:29 +0000

even single-output is a dag; you can explode the dag into a tree, but then you pay in time. suppose some expensive term x is used to compute n other terms. after computing x, assuming we want to share it, we have to compute all n terms before killing x; hence x sticks around longer than it might. (this doesn't necessarily lead to explosion, but i'm pretty sure you can use this to construct a scenario that needs a large number of live terms to avoid compromising time complexity; at least n live terms if the dag has nlogn nodes)

New comment by moonchild in "Tracing garbage collection for arenas"

moonchild — Wed, 26 Jun 2024 23:14:19 +0000

greenspun's tenth rule:

> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

New comment by moonchild in "Tracing garbage collection for arenas"

moonchild — Wed, 26 Jun 2024 22:16:27 +0000

i would do a variable-size header 4 or 8 bytes for most small objects (8 if you want to provide alignment, 4 if not or if you can squish your pointers). one bit identifies if the header is small or large; if small, next few bits give the size of the object in words, and the remainder are a bitmap telling which words are pointers. forwarding pointer stomps on the first word of the object

New comment by moonchild in "A brief introduction to interval arithmetic"

moonchild — Wed, 26 Jun 2024 22:00:43 +0000

(dynamic) computation forms a dag, not a tree. i think a sum scan (n inputs -> n outputs) will trigger the worst case. it might be that computations tend to be tree-shaped, so you rarely hit the worst case, but that helps everybody out, not just affine arithmetic

New comment by moonchild in "A brief introduction to interval arithmetic"

moonchild — Wed, 26 Jun 2024 09:26:03 +0000

feel free to ping on irc if interested in chatting more on the topic, though i'm not sure i have a ton more original thoughts atm

if you have an expression with n terms, then you will end up with O(n) terms each taking up O(n) space, so the overall space usage is quadratic. (that's the fair way to compare to polyhedra since they're always global)

New comment by moonchild in "A brief introduction to interval arithmetic"

moonchild — Wed, 26 Jun 2024 08:25:15 +0000

so a middle ground (as many pseudo-symbolic approaches). glanced at wikipedia—this seems not dissimilar conceptually to the abstract domain of polyhedra, in that it's symbolic but has a flat structure and expresses only linear relationships. of course, polyhedra are exponential where affine arithmetic is quadratic (overall space in number of terms), and polyhedra are global where affine is local (therefore easier to implement as a library, but no sharing). more missed connections between numerical analysis and program analysis? (meh but no sharing probably means affine generally loses for a given resource budget. would still be interesting to try though)