Hacker News: BeeOnRope

New comment by BeeOnRope in "The RAM shortage could last years"

BeeOnRope — Sun, 19 Apr 2026 23:22:23 +0000

Wafer area?

New comment by BeeOnRope in "Cloud VM benchmarks 2026"

BeeOnRope — Mon, 09 Mar 2026 12:49:06 +0000

> The most insane part here is that the AMD EPYC 4565p can beat the turin's used on the cloud providers, by as much as 2x in the single core.

That is ... hard to believe for a CPU-bound task. Do you have any open benchmark which can reproduce that?

New comment by BeeOnRope in "What every compiler writer should know about programmers (2015) [pdf]"

BeeOnRope — Tue, 17 Feb 2026 18:12:41 +0000

What I mean is that we look at a function in isolation and see that it doesn't have any "dead code", e.g.,:

  int factorial(int x) {
    if (x < 0) throw invalid_input();
    // compute factorial ...
  }

This doesn't have any dead code in a static examination: at compilation-time, however, this function may be compiled multiple times, e.g., as factorial(5) or factorial(x) where x is known to be non-negative by range analysis. In this case, the `if (x < 0)` is simply pruned away as "dead code", and you definitely want this! It's not a minor thing, it's a core component of an optimizing compiler.

This same pruning is also responsible for the objectionable pruning away of dead code in the examples of compilers working at cross-purposes to programmers, but it's not easy to have the former behavior without the latter, and that's also why something like -Wdead-code is hard to implement in a way which wouldn't give constant false-positives.

New comment by BeeOnRope in "What every compiler writer should know about programmers (2015) [pdf]"

BeeOnRope — Tue, 17 Feb 2026 06:22:48 +0000

Why is that a problem? Inlining and optimization aren't minor aspects of compiling to native code, they are responsible for order-of-magnitude speedups.

My point is that it is easy to say "don't remove my code" while looking at a simple single-function example, but in actual compilation huge portions of a function are "dead" after inlining, constant propagation and other optimizations: not talking anything about C-specific UB or other shenanigans. You don't want to throw that out.

New comment by BeeOnRope in "What every compiler writer should know about programmers (2015) [pdf]"

BeeOnRope — Tue, 17 Feb 2026 05:58:04 +0000

Dead code is extremely common in C or C++ after inlining, other optimizations.

New comment by BeeOnRope in "How many registers does an x86-64 CPU have? (2020)"

BeeOnRope — Tue, 17 Feb 2026 05:54:06 +0000

Relevant section:

> Compiler controlled memory: There is a mechanism in the processor where frequently accessed memory locations can be as fast as registers. In Figure 2, if the address of u is the same as x, then the last load μ-op is a nop. The internal value in register r25 is forwarded to register r28, by a process called write-buffer feedforwarding. That is to say, provided the store is pending or the value to be stored is in the write-buffer, then loading form a memory location is as fast as accessing external registers.

I think it over-sells the benefit. Store forwarding is a thing, but it does not erase the cost of the load or store, at least certainly on the last ~20 years of chips and I don't think on the PII (the target of the paper) either.

The load and store still effectively occur in terms of port usage, so the usual throughput, etc, limits apply. There is a benefit in latency of a few cycles. Perhaps also the L1 cache access itself is omitted, which could help for bank conflicts, though on later uarches there were few to none of these so you're left with perhaps a small power benefit.

New comment by BeeOnRope in "How many registers does an x86-64 CPU have? (2020)"

BeeOnRope — Sat, 14 Feb 2026 16:09:21 +0000

More registers leads to less spilling not more, unless the compiler is making some really bad choices.

Any easy way to see that is that the system with more registers can always use the same register allocation as the one with fewer, ignoring the extra registers, if that's profitable (i.e. it's not forced into using extra caller-saved registers if it doesn't want to).

New comment by BeeOnRope in "Converting a $3.88 analog clock from Walmart into a ESP8266-based Wi-Fi clock"

BeeOnRope — Tue, 10 Feb 2026 02:10:28 +0000

User variance? Any evidence?

New comment by BeeOnRope in "Prek: A better, faster, drop-in pre-commit replacement, engineered in Rust"

BeeOnRope — Tue, 03 Feb 2026 23:12:49 +0000

How does prek handle pre-push hooks? I.e. how does it determine the list of modified files.

This is a long standing sore point in pre-commit, see https://github.com/pre-commit/pre-commit/issues/860 and also linked duplicates (some of which are not duplicates).

New comment by BeeOnRope in "Prek: A better, faster, drop-in pre-commit replacement, engineered in Rust"

BeeOnRope — Tue, 03 Feb 2026 23:03:02 +0000

If you had a shell script hook, yes you would also run that in CI.

Are you asking what advantage pre-commit has over a shell script?

Mostly just functionality: running multiple hooks, running them in parallel, deciding which hooks to run based on the commit files, "decoding" the commit to a list of files, offering a bunch canned hooks, offering the ability to write and install non-shell hooks in a standard way.

New comment by BeeOnRope in "Prek: A better, faster, drop-in pre-commit replacement, engineered in Rust"

BeeOnRope — Tue, 03 Feb 2026 17:30:00 +0000

They integrate well with CI.

You run the same hooks in CI as locally so it's DRY and pushes people to use the hooks locally to get the early feedback instead of failing in CI.

Hooks without CI are less useful since they will be constantly broken.

New comment by BeeOnRope in "Without the futex, it's futile"

BeeOnRope — Wed, 20 Aug 2025 01:00:07 +0000

Critical section was IIRC built on top of windows manual/auto reset events which are a different primitive useful for more than just mutex but without the userspace coordination aspect (32 bit value) of futexes.

New comment by BeeOnRope in "Using Radicle CI"

BeeOnRope — Wed, 23 Jul 2025 17:14:58 +0000

What is a BM?

New comment by BeeOnRope in "Async I/O on Linux in databases"

BeeOnRope — Sun, 20 Jul 2025 17:34:30 +0000

What is the point of the intent entry at all? It seems like operations are only durable after the completion record is written so the intent record seems to serve no purpose (unless it is say much larger).

New comment by BeeOnRope in "Framework Laptop 12 review"

BeeOnRope — Thu, 19 Jun 2025 00:51:22 +0000

No there are vanilla 64GB shipping now too, e.g. Crucial CT2K64G56C46S5.

New comment by BeeOnRope in "What If We Could Rebuild Kafka from Scratch?"

BeeOnRope — Sun, 27 Apr 2025 03:48:56 +0000

Writes to the log don't need to be in the order of the producers timestamp, they just need to be in some (and respect ack to produce causality, etc).

New comment by BeeOnRope in "Better Shell History Search"

BeeOnRope — Thu, 03 Apr 2025 01:56:21 +0000

Yes, I rely heavily on ^ and ' in antuin, though that's partly to workaround the relatively poor fuzzy search (in fzf I never even needed those).

New comment by BeeOnRope in "Better Shell History Search"

BeeOnRope — Fri, 28 Mar 2025 00:18:49 +0000

I use Atuin and like it a lot, and sync history across hosts.

However, the fuzzy search in Atuin is worse than fzf, which was a downgrade. It just has less effective heuristics/scoring, e.g. it might find the individual letters of a short command scattered in a command that had a long base64 input or something.

New comment by BeeOnRope in "How I got 100% off my train travel"

BeeOnRope — Wed, 19 Mar 2025 23:47:41 +0000

What is a late slip?

New comment by BeeOnRope in "The Slotted Counter Pattern (2020)"

BeeOnRope — Wed, 05 Feb 2025 01:29:30 +0000

Uncontended CAS without carried dependendies on the result (almost always the case in this use case) are similar in performace to atomic add on most platforms.

The CAS is the price they pay for contention detection, though it would be interesting to consider solutions which usually use unconditional atomics with only the occasional CAS in order to check contention, or which relied on some other contention detection approach (e.g., doing a second read to detect when the value incremented by more than your own increment).

The solution looks reasonable to me given the constraints.