Hacker News: bassp

New comment by bassp in "The Coming Need for Formal Specification"

bassp — Sat, 13 Dec 2025 06:53:49 +0000

> Some problems are straightforward to specify. A file system is a good example.

I’ve got to disagree with this - if only specifying a file system were easy!

From the horse’s mouth, the authors of the first “properly” verified FS (that I’m aware of), FSCQ, note that:

> we wrote specifications for a subset of the POSIX system calls using CHL, implemented those calls inside of Coq, and proved that the implementation of each call meets its specification. We devoted substantial effort to building reusable proof automation for CHL. However, writing specifications and proofs still took a significant amount of time, compared to the time spent writing the implementation

(Reference: https://dspace.mit.edu/bitstream/handle/1721.1/122622/cacm%2...)

And that’s for a file system that only implements a subset of posix system calls!

New comment by bassp in "Prefix sum: 20 GB/s (2.6x baseline)"

bassp — Tue, 14 Oct 2025 19:23:53 +0000

Yes! There’s a canonical algorithm called the “Blelloch scan” for prefix sum (aka prefix scan, because you can generalize “sum” to “any binary associative function”) that’s very gpu friendly. I have… fond is the wrong word, but “strong” memories of implementing in a parallel programming class :)

Here’s a link to a pretty accessible writeup, if you’re curious about the details: https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-co...

New comment by bassp in "You might not need Redis"

bassp — Sat, 08 Mar 2025 20:53:53 +0000

Sure, but that’s not what the person responding to my original comment was suggesting :). They suggested that you serialize entire data structures (bloom filters, lists, sets, etc…) into a relational DB to get redis-like functionality out of it; I chose a list as an example to illustrate why that’s not a great option in many cases.

You’re right that managing lists in RDMSes is easy-ish, if you don’t have too many of them, and they’re not too large. But, like I mentioned in my original comment, redis really shines as a complex data structure server. I wouldn’t want to implement my own cuckoo filter in Postgres!

New comment by bassp in "You might not need Redis"

bassp — Sat, 08 Mar 2025 19:50:27 +0000

They can (and that's probably the right choice for a lot of use cases, especially for small data structures and infrequently updated ones), but serializing and storing them in a database requires you to (in your application code) implement synchronization logic and pay the performance cost for said logic; for instance, if you want to `append` to a shared list, you need to deserialize the list, append to the end of it in your application code, and write it back to the DB. You'd need use some form of locking to prevent appends from overwriting each other, incurring a pretty hefty perf penalty for hot lists. Also, reading an entire list/tree/set/whatever back just to add/delete one element is very wasteful (bandwidth/[de]serialization cost-wise)

New comment by bassp in "You might not need Redis"

bassp — Sat, 08 Mar 2025 18:23:43 +0000

I agree with the author 100% (the TanTan anecdote is great, super clever work!), but.... sometimes you do need Redis, because Redis is the only production-ready "data structure server" I'm aware of

If you want to access a bloom filter, cuckoo filter, list, set, bitmap, etc... from multiple instances of the same service, Redis (slash valkey, memorydb, etc...) is really your only option

New comment by bassp in "Kafka at the low end: how bad can it get?"

bassp — Wed, 19 Feb 2025 03:42:56 +0000

I use Kafka for a low-message-volume use case because it lets my downstream consumers replay messages… but yeah in most cases, it’s over kill

New comment by bassp in "Beej's Guide to Git"

bassp — Wed, 05 Feb 2025 04:02:38 +0000

Your network programming guide really saved my bacon back when I was taking a networking class, I appreciate all your hard work!

New comment by bassp in "Is gRPC Better for Microservices Than GraphQL?"

bassp — Sat, 25 Jan 2025 20:48:50 +0000

That's really clever! Kudos. I'm gonna set aside some time this week to dive into the implementation

New comment by bassp in "Is gRPC Better for Microservices Than GraphQL?"

bassp — Sat, 25 Jan 2025 18:45:04 +0000

IME, yes.

Here a couple problems I've run into using GQL for backend to backend communication:

* Auth. Good GQL APIs think carefully about permission management on a per-field basis (bad GQL apis slap some auth on an entire query or type and call it a day). Back-end services, obviously, are not front end clients, and want auth that grants their service access to an entire query, object, or set of queries/mutations. This leads to tension, and (often) hacky workarounds, like back-end services pretending to be "admin users" to get the access they need to a GQL API.

* Nested federation. Federation is super powerful, and, to be fair, data loaders do a great job of solving the N+1 query problem when a query only has one "layer" of federation. But, IME, GQL routers are not smart enough to handle nested federation; ie querying for a list of object `A`s, then federating object `B` on to each `A`, then federating object `C` on to each `B`. The latency for these kinds of queries is, usually, absolutely terrible, and I'd rather make these kinds of queries over gRPC (eg hit one endpoint for all the As, then use the result to get all the Bs, then use all the Bs to get all the Cs)

New comment by bassp in "The Missing Nvidia GPU Glossary"

bassp — Wed, 15 Jan 2025 00:27:08 +0000

Sorry if I was sloppy with my wording, instruction issuance is what I meant :)

I thought that warps weren't issued instructions unless they were ready to execute (ie had all the data they needed to execute the next instruction), and that therefore it was a best practice, in most (not all) cases to have more threads per block than the SM can execute at once so that the warp scheduler can issue instructions to one warp while another waits on a memory read. Is that not true?

New comment by bassp in "The Missing Nvidia GPU Glossary"

bassp — Tue, 14 Jan 2025 20:44:20 +0000

You can request up to 1024-2048 threads per block depending on the gpu; each SM can execute between 32 and 128 threads at a time! So you can have a lot more threads assigned to an SM than the SM can run at once

New comment by bassp in "The Missing Nvidia GPU Glossary"

bassp — Tue, 14 Jan 2025 19:42:50 +0000

I was taught that you want, usually, more threads per block than each SM can execute, because SMs context switch between threads (fancy hardware multi threading!) on memory read stalls to achieve super high throughput.

There are, ofc, other concerns like register pressure that could affect the calculus, but if an SM is waiting on a memory read to proceed and doesn’t have any other threads available to run, you’re probably leaving perf on the table (iirc).

New comment by bassp in "Formal Methods: Just Good Engineering Practice? (2024)"

bassp — Fri, 10 Jan 2025 18:37:50 +0000

Not the OP, but Hillel Wayne’s course/tutorial (https://www.learntla.com/) is fantastic. It’s focused on building practical skills, and helped me build enough competence to write a few (simple, but useful!) specs for some of the systems I work on.

New comment by bassp in "Formal Methods: Just Good Engineering Practice? (2024)"

bassp — Fri, 10 Jan 2025 16:32:26 +0000

It’s not all or nothing!

I work on a very “product-y” back end that isn’t fully specified, but I have formally specified parts of it.

For instance, I property-based-tested a particularly nasty state machine I wrote to ensure that, no matter what kind of crazy input I called an endpoint with, the underlying state machine never made any invalid transitions. None of the code around the state machine has a formal spec, but because the state machine does, I was able to specify it.

In the process, I found some very subtle bugs I’d have never caught via traditional unit testing.

New comment by bassp in "Century Scale Storage"

bassp — Sat, 14 Dec 2024 22:19:34 +0000

I’m surprised that this piece mentioned Microsoft, but didn’t touch on Microsoft’s solution to this problem: project silica (https://www.microsoft.com/en-us/research/project/project-sil...), which stores data on etched pieces of quartz glass that are supposed to be able reliably store data for thousands a of years. Of course, you still need to solve the dispersal problem, and need to make sure that the knowledge of how to read the glass tablets is passed down, but hey, nothing’s perfect!

New comment by bassp in "Comparing AWS S3 with Cloudflare R2: Price, Performance and User Experience"

bassp — Wed, 27 Nov 2024 18:21:12 +0000

That’s a good point!

I think I overstated the case a little, I definitely don’t think automated reasoning is some “secret reliability sauce” that nobody else can replicate; it does give me more confidence that Amazon takes reliability very seriously, and is less likely to ship a terrible bug that messes up my data.

New comment by bassp in "Comparing AWS S3 with Cloudflare R2: Price, Performance and User Experience"

bassp — Wed, 27 Nov 2024 17:11:38 +0000

Minimally, the two examples I cited: Shardstore and Shuttle. The former is a (lightweight) formally verified key value store used by S3, and the latter is a model checker for concurrent rust code.

Amazon has an entire automated reasoning group (researchers who mostly work on formal methods) working specifically on S3.

As far as I’m aware, nobody at cloudflare is doing similar work for R2. If they are, they’re certainly not publishing!

Money might not be the bottleneck for cloudflare though, you’re totally right

New comment by bassp in "Comparing AWS S3 with Cloudflare R2: Price, Performance and User Experience"

bassp — Wed, 27 Nov 2024 16:58:48 +0000

Only tangentially related to the article, but I’ve never understood how R2 offers 11 9s of durability. I trust that S3 offers 11 9s because Amazon has shown, publicly, that they care a ton about designing reliable, fault tolerant, correct systems (eg Shardstore and Shuttle)

Cloudflare’s documentation just says “we offer 11 9s, same as S3”, and that’s that. It’s not that I don’t believe them but… how can a smaller organization make the same guarantees?

It implies to me that either Amazon is wasting a ton of money on their reliability work (possible) or that cloudflare’s 11 9s guarantee comes with some asterisks.

New comment by bassp in "Launch HN: Regatta Storage (YC F24) – Turn S3 into a local-like, POSIX cloud FS"

bassp — Mon, 18 Nov 2024 18:32:45 +0000

Gotcha! Thanks for the answer; so the tl;dr is, if I’m understanding:

“All fsync-ed writes will eventually make it to S3, but fsync successfully returning only guarantees that writes are durable in our NFS caching layer, not in the S3 layer”?

New comment by bassp in "Launch HN: Regatta Storage (YC F24) – Turn S3 into a local-like, POSIX cloud FS"

bassp — Mon, 18 Nov 2024 18:15:48 +0000

For sure! Upon reflection, maybe I’m less curious about crash consistency (corruption or whatever) per-se, and more about what kinds of durability guarantees I can expect in the presence of a crash.

I’m specifically interested in how you’re handling synchronization between the NFS layer and S3 wrt fsync. The description says that data is “asynchronously” written back out to S3. That implies to me that it’s possible for something like this to happen:

1. I write to a file and fsync it

2. Your NFS layer makes the file durable and returns

3. Your NFS layer crashes (oh no, the intern merged some bad terraform!) before it writes back to S3

4. I go to read the file from S3… and it’s not there!

Is that possible? IE is the only way to get a consistent view of the data by reading “through” the nfs layer, even if I fsync?