Hacker News: brandmeyer

New comment by brandmeyer in "OpenTrafficMap"

brandmeyer — Thu, 30 Apr 2026 12:53:33 +0000

The vast majority of POI churn information comes from streetview + machine learning object detection + automatic change detection + human verification. There are many clever algorithms in play through the entire pipeline. As moats go, its probably bigger than search.

New comment by brandmeyer in "What every computer scientist should know about floating-point arithmetic (1991) [pdf]"

brandmeyer — Mon, 16 Mar 2026 13:27:42 +0000

> they (might) have a float, and are using the `==` operator, they're doing something wrong.

Storage, retrieval, transmission, and serialization/deserialization systems should be able to transmit and round-trip floats without losing any bits at all.

New comment by brandmeyer in "What every computer scientist should know about floating-point arithmetic (1991) [pdf]"

brandmeyer — Mon, 16 Mar 2026 13:24:52 +0000

Repeating the exercise with something that is exactly representable in floating point like 1/8 instead of 1/10 highlights the difference.

New comment by brandmeyer in "Infrastructure decisions I endorse or regret after 4 years at a startup (2024)"

brandmeyer — Fri, 20 Feb 2026 04:27:58 +0000

They are very similar to the pros and cons of having a monorepo. It encourages information sharing and cross-linkage between related teams. This is simultaneously its biggest pro and its biggest con.

New comment by brandmeyer in "AMD claims Arm ISA doesn't offer efficiency advantage over x86"

brandmeyer — Sat, 13 Sep 2025 15:14:12 +0000

You can't even express static rounding in C. You can't even express them in the LLVM language-independent IR. Any attempt to use the static rounding modes will necessarily involve intrinsics and/or assembly.

New comment by brandmeyer in "AMD claims Arm ISA doesn't offer efficiency advantage over x86"

brandmeyer — Tue, 09 Sep 2025 13:20:54 +0000

Nothing major, just some oddball decisions here and there.

Fused compare-and-branch only extends to the base integer instructions. Anything else needs to generate a value that feeds into a compare-and-branch. Since all branches are compare-and-branch, they all need two register operands, which impairs their reach to a mere +/- 4 kB.

The reach for position-independent code instructions (AUIPC + any load or store) is not quite +/- 2 GB. There is a hole on either end of the reach that is a consequence of using a sign-extended 12-bit offset for loads and stores, and a sign-extended high 20-bit offset for AIUPC. ARM's adrp (address of page) + unsigned offsets is more uniform.

RV32 isn't a proper subset of RV64, which isn't a proper subset of RV128. If they were proper subsets, then RV64 programs could run unmodified on RV128 hardware. Not that its going to ever happen, but if it did, the processor would have to mode-switch, not unlike the x86-64 transition of yore.

Floating point arithmetic spends three bits in the instruction encoding to support static rounding modes. I can count on zero hands the number of times I've needed that.

The integer ISA design goes to great lengths to avoid any instructions with three source operands, in order to simplify the datapaths on tiny machines. But... the floating point extension correctly includes fused multiply-add. So big chunks of any high-end processor will need three-operand datapaths anyway.

The base ISA is entirely too basic, and a classic failure of 90% design. Just because most code doesn't need all those other instructions doesn't mean that most systems don't. RISC-V is gathering extensions like a Katamari to fill in all those holes (B, Zfa, etc).

None of those things make it bad, I just don't think its nearly as shiny as the hype. ARM64+SVE and x86-64+AVX512 are just better.

New comment by brandmeyer in "4k NASA employees opt to leave agency through deferred resignation program"

brandmeyer — Sun, 27 Jul 2025 15:16:28 +0000

Any argument that is filled with this much ragebait should be dismissed out of hand.

New comment by brandmeyer in "FFmpeg devs boast of another 100x leap thanks to handwritten assembly code"

brandmeyer — Mon, 21 Jul 2025 04:45:16 +0000

ISPC suffers from poor scatter and gather support in hardware. The direct result is that it is hard to make programs that scale in complexity without resorting to shenanigans.

An ideal scatter-read or gather-store instruction should take time proportional to the number of cache lines that it touches. If all of the lane accesses are sequential and cache line aligned it should take the same amount of time as an aligned vector load or store. If the accesses have high cache locality such that only two cache lines are touched, it should cost exactly the same as loading those two cache lines and shuffling the results into place. That isn't what we have on x86-AVX512. They are microcoded with inefficient lane-at-a-time implementations. If you know that there is good locality of reference in the access, then it can be faster to hand-code your own cache line-at-a-time load/shuffle/masked-merge loop than to rely on the hardware. This makes me sad.

ISPC's varying variables have no way to declare that they are sequential among all lanes. Therefore, without extensive inlining to expose the caller's access pattern, it issues scatters and gathers at the drop of a hat. You might like to write your program with a naive x[y] (x a uniform pointer, y a varying index) in a subroutine, but ISPC's language cannot infer that y is sequential along lanes. So, you have to carefully re-code it to say that y is actually a uniform offset into the array, and write x[y + programIndex]. Error-prone, yet utterly essential for decent performance. I resorted to munging my naming conventions for such indexes, not unlike the Hungarian notation of yesteryear.

Rewriting critical data structures in SoA format instead of AoS format is non-trivial, and a prerequisite to get decent performance from ISPC. You cannot "just" replace some subroutines with ISPC routines, you need to make major refactorings that touch the rest of the program as well. This is neutral in an ISPC-versus-intrinsics (or even ISPC-versus-GPU) shootout, but it is worth mentioning only to point out that ISPC is also not a silver bullet in this regards, either.

Non-minor nit: The ISPC math library gives up far too much precision by default in the name of speed. Fortunately, Sleef is not terribly difficult to integrate and use for the 1-ulp max rounding error that I've come to expect from a competent libm.

Another: The ISPC calling convention adheres rather strictly to the C calling convention... which doesn't provide any callee-saved vector registers, not even for the execution mask. So if you like to decompose your program across multiple compilation units, you will also notice much more register save and restore traffic than you would like or expect.

I want to like it, I can get some work done in it, and I did get significant performance improvements over scalar code when using it. But the resulting source code and object code are not great. They are merely acceptable.

New comment by brandmeyer in "Europe's first geostationary sounder satellite is launched"

brandmeyer — Sat, 05 Jul 2025 21:56:19 +0000

Shout-out to the NOAA GFS team, who publish the GFS analysis directly to AWS S3.

https://registry.opendata.aws/noaa-gfs-bdp-pds/

New comment by brandmeyer in "Fun with C++26 reflection: Keyword Arguments"

brandmeyer — Tue, 11 Feb 2025 02:47:03 +0000

> And I suppose you could write a validator to make sure that this worked.

Like this one!

https://clang.llvm.org/extra/clang-tidy/checks/bugprone/argu...

New comment by brandmeyer in "The origins of 60-Hz as a power frequency (1997)"

brandmeyer — Sat, 08 Feb 2025 02:05:32 +0000

> introduced by PWM dimming, but why would that be a low enough frequency to bother people?

The human fovea has a much lower effective refresh rate than your peripheral vision. So you might notice the flickering of tail lights (and daytime running lights) seen out of the corner of your eye even though you can't notice when looking directly at them.

New comment by brandmeyer in "Software development topics I've changed my mind on"

brandmeyer — Wed, 05 Feb 2025 13:55:48 +0000

clang-format and clang-tidy are both excellent for C and C++ (and protobuf, if your group uses it). Since they are based on the clang front-end, they naturally have full support for both languages and all of their complexity.

New comment by brandmeyer in "A better approach to gravity: how we made EGM2008 faster"

brandmeyer — Wed, 04 Dec 2024 15:21:47 +0000

LUTs are commonly used in geodesy applications on or near the Earth's surface. The full multipole model is used for orbital applications to account for the way that local lumpiness in Earth's mass distribution is smoothed out with increasing distance from the surface. It might be reasonable to build a 3D LUT for use at Starlink scale or higher, but certainly not for individual satellites.

New comment by brandmeyer in "A better approach to gravity: how we made EGM2008 faster"

brandmeyer — Tue, 03 Dec 2024 14:22:56 +0000

Exactly what order and degree were you using to evaluate the model? Variations in drag and solar pressure are more significant than the uncertainty in the gravity field for objects in LEO somewhere much less than 127th order (40 microseconds on my machine, your smileage may vary), so you can safely truncate the model for simulations. GRACE worked by making many passes such that they could average out those perturbations to make their measurement. But for practical applications, those tiny terms are irrelevant.

IERS Technical Note 36 section 6.1 gives recommendations for model truncation if you are looking for justification. https://iers-conventions.obspm.fr/content/tn36.pdf

New comment by brandmeyer in "C uses "&" for the address-operator because 'ampersand sounds like "address"'"

brandmeyer — Tue, 10 Oct 2023 13:21:48 +0000

> "^integer" which is better than "int"

I agree that it reads a little better. But as a small-handed person, it is unfortunately much more uncomfortable to type.

New comment by brandmeyer in "Add extra stuff to a “standard” encoding? Sure, why not"

brandmeyer — Wed, 20 Sep 2023 13:28:57 +0000

> isn’t a standard in that protobuf doesn’t care

Shove protobuf into Something Else that does packet delimitation for you. I'm fond of SQLite for offline cases as a richer alternative to sstable.

New comment by brandmeyer in "Google releases Bard to a limited number of users in the US and UK"

brandmeyer — Tue, 21 Mar 2023 16:47:40 +0000

Shout-out to Firefox Multi-Account Containers, which Just Works (TM) for exactly this use case.

New comment by brandmeyer in "Facebook and Google hand over user data, help police prosecute abortion seekers"

brandmeyer — Sun, 05 Mar 2023 20:16:39 +0000

This comment is a fantastic example of the meta-battle which the tech industry has been waging. They have worked very hard to change the very questions in the privacy debate. In their terms, collection of the data itself is never under debate; all debates are framed in terms of how they are allowed to use data. In this case, the failure isn't the law or the company's compliance with it. It was the collection and retention of the data in the first place.

New comment by brandmeyer in "Can sanitizers find the two bugs I wrote in C++?"

brandmeyer — Wed, 08 Feb 2023 14:07:48 +0000

Running multiple test builds in parallel isn't all that difficult, though. One with ubsan, one with asan, and one (opt-in on a test-by-test basis) with tsan.

New comment by brandmeyer in "Richard Hipp Speaks Out on SQLite (2019) [pdf]"

brandmeyer — Tue, 13 Dec 2022 02:43:39 +0000

> There aren't connections in SQLite like there are for network-connected DBs (eg MySQL)

The handle named `db` in your example is as expensive to create as a connection in the relative sense and should be optimized the same way in your application. It may be less expensive than a network database connection, but its still very expensive relative to any subsequent queries. The database file is opened, its first page is read, its schema is parsed from text (!), some memory is pre-allocated and so on every time you make that call.

The sqlite3_exec call invokes the parser and query optimizer every time it is called. A better benchmark would be to compile a valid statement (sqlite3_prepare). The unit under test should just be sqlite_bind() for relevant parameters followed by sqlite_step(), and maybe (but not necessarily) sqlite_reset().