Hacker News: sakras

New comment by sakras in "Two Bits Are Better Than One: making bloom filters 2x more accurate"

sakras — Sun, 22 Feb 2026 05:22:19 +0000

Yeah I kind of think authors didn't conduct a thorough-enough literature review here. There are well-known relations between number of hash functions you use and the FPR, cache-blocking and register-blocking are classic techniques (Cache-, Hash-, and Space-Efficient Bloom Filters by Putze et. al), and there are even ways of generating patterns from only a single hash function that works well (shamelessly shilling my own blogpost on the topic: https://save-buffer.github.io/bloom_filter.html)

I also find the use of atomics to build the filter confusing here. If you're doing a join, you're presumably doing a batch of hashes, so it'd be much more efficient to partition your Bloom filter, lock the partitions, and do a bulk insertion.

New comment by sakras in "FreeCAD"

sakras — Fri, 20 Feb 2026 07:03:03 +0000

Have you tried SolveSpace? It's easily my favorite open source CAD program. The main things it's missing are shells, fillets, and chamfers. But I've been able to 3D print quite a few parts using it!

New comment by sakras in "F3: Open-source data file format for the future [pdf]"

sakras — Thu, 02 Oct 2025 05:28:10 +0000

Giving it a quick look, seems like they've addressed a lot of the shortcomings of Parquet which is very exciting. In no particular order:

- Parquet metadata is Thrift, but with comments saying "if this field exists, this other field must exist", and no code actually verifying the fact, so I'm pretty sure you could feed it bogus Thrift metadata and crash the reader.

- Parquet metadata must be parsed out, meaning you have to: allocate a buffer, read the metadata bytes, and then dynamically keep allocating a whole bunch of stuff as you parse the metadata bytes, since you don't know the size of the materialized metadata! Too many heap allocations! This file format's Flatbuffers approach seems to solve this as you can interpret Flatbuffer bytes directly.

- The encodings are much more powerful. I think a lot of people in the database community have been saying that we need composable/recursive lightweight encodings for a long time. BtrBlocks was the first such format that was open in my memory, and then FastLanes followed up. Both of these were much better than Parquet by itself, so I'm glad ideas from those two formats are being taken up.

- Parquet did the Dremel record-shredding thing which just made my brain explode and I'm glad they got rid of it. It seemed to needlessly complicate the format with no real benefit.

- Parquet datapages might contain different numbers of rows, so you have to scan the whole ColumnChunk to find the row you want. Here it seems like you can just jump to the DataPage (IOUnit) you want.

- They got rid of the heavyweight compression and just stuck with the Delta/Dictionary/RLE stuff. Heavyweight compression never did anything anyway, and was super annoying to implement, and basically required you to pull in 20 dependencies.

Overall great improvement, I'm looking forward to this taking over the data analytics space.

New comment by sakras in "Show HN: Luminal – Open-source, search-based GPU compiler"

sakras — Wed, 20 Aug 2025 18:38:13 +0000

I see you guys are using Egg/Egglog! I've been mildly interested in egraphs for quite a while, glad to see they're gaining traction!

New comment by sakras in "New Aarch64 Back End"

sakras — Fri, 25 Jul 2025 00:36:24 +0000

Likely performance - LLVM is somewhat notorious for being slower than ideal.

New comment by sakras in "New Aarch64 Back End"

sakras — Fri, 25 Jul 2025 00:35:14 +0000

Likely refers to Machine IR, a lower level representation that normal LLVM IR lowers to?

New comment by sakras in "Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons"

sakras — Mon, 16 Jun 2025 22:09:52 +0000

Typically requests are binned by context length so that they can be batched together. So you might have a 10k bin and a 50k bin and a 500k bin, and then you drop context past 500k. So the costs are fixed per-bin.

New comment by sakras in "How linear regression works intuitively and how it leads to gradient descent"

sakras — Thu, 08 May 2025 18:43:26 +0000

I intuitively think about linear regression as attaching a spring between every point and your regression line (and constraining the spring to be vertical). When the line settles, that's your regression! Also gives a physical intuition about what happens to the line when you add a point. Adding a point at the very end will "tilt" the line, while adding a point towards the middle of your distribution will shift it up or down.

A while ago I think I even proved to myself that this hypothetical mechanical system is mathematically equivalent to doing a linear regression, since the system naturally tries to minimize the potential energy.

New comment by sakras in "Proof-of-work to protect lore.kernel.org and git.kernel.org against AI crawlers"

sakras — Wed, 02 Apr 2025 22:52:01 +0000

Maybe I'm missing something, but why do people expect PoW to be effective against companies who's whole existence revolves around acquiring more compute?

New comment by sakras in "The insecurity of telecom stacks in the wake of Salt Typhoon"

sakras — Wed, 12 Mar 2025 18:35:56 +0000

New smart grids with new software do not require a rewrite!

New comment by sakras in "The insecurity of telecom stacks in the wake of Salt Typhoon"

sakras — Wed, 12 Mar 2025 17:15:33 +0000

I've been beating the drum about this to everyone who will listen lately, but I'll beat it here too! Why don't we use seL4 for everything? People are talking about moving to a smart grid, having IoT devices everywhere, putting chips inside of peoples' brains (!!!), cars connect to the internet, etc.

Anyway, it's insane that we have a mathematically-proven secure kernel, we should use it! Surely there's a startup in this somewhere..

New comment by sakras in "The Demoralization is just Beginning"

sakras — Wed, 05 Mar 2025 06:42:50 +0000

> Sure, we don’t produce anything, but we have companies with high revenues and we can raise money based on those revenues. We’ll both be rich!

I think this is the central hole in the argument that the US is stagnant. The money that investors give you has to come from somewhere! Particularly in venture capital, you only get returns if you produce value.

Nevertheless, I do agree with a lot of the points here.

New comment by sakras in "Ask HN: What less-popular systems programming language are you using?"

sakras — Wed, 05 Mar 2025 01:48:02 +0000

> epoll is threaded, not multiprocess

epoll is orthogonal to threads. It _can_ be used in a multithreaded program, but it doesn't have to be. It may very well be implemented in terms of kernel threads, but that's not what I'm talking about. I'm talking about user-space threads.

New comment by sakras in "Ask HN: What less-popular systems programming language are you using?"

sakras — Tue, 04 Mar 2025 19:18:55 +0000

Conceptually yes, but I suspect there's going to be a lot hairier in practice. For instance, I think there's some stuff that needs language support such as thread-local storage. I'd guess it would be simpler to just re-implement threading from scratch using syscalls. But I also don't think the language provides any support for atomics, so you'd have to roll your own there.

New comment by sakras in "Ask HN: What less-popular systems programming language are you using?"

sakras — Tue, 04 Mar 2025 19:15:34 +0000

Maybe I'm misunderstanding something, but it seems like ev is still multiprocessing? Reading the code, it looks like you can read/write to files, and if you want to kick off some other work it spawns a process. I don't see any instance of threads there.

New comment by sakras in "Ask HN: What less-popular systems programming language are you using?"

sakras — Tue, 04 Mar 2025 07:28:38 +0000

I was pretty excited about Hare until Devault said that Hare wouldn't be doing multithreading as he preferred multiprocessing. That was a pretty big dealbreaker for me. The rest of the language looks quite clean though!

New comment by sakras in "Tensor evolution: A framework for fast tensor computations using recurrences"

sakras — Tue, 18 Feb 2025 22:42:09 +0000

> Tensor Evolution (TeV)

Oh no, I'm never not going to read this as "tera-electron Volts". I wish they picked a different acronym!

New comment by sakras in "Exposed DeepSeek database leaking sensitive information, including chat history"

sakras — Thu, 30 Jan 2025 21:15:10 +0000

Ah interesting! I've definitely seen some French code somewhere with the variable names being all French as well, so it really was strange to see them be mixed.

New comment by sakras in "Exposed DeepSeek database leaking sensitive information, including chat history"

sakras — Thu, 30 Jan 2025 07:08:33 +0000

From what I’ve seen, code usually comes in one of two languages: English or French. Somehow everyone but the French speaks enough English to write code!

New comment by sakras in "SiFive's P550 Microarchitecture"

sakras — Tue, 28 Jan 2025 19:23:04 +0000

> A statement like this pretty much disqualifies your opinions

This is needlessly aggressive. I specialize in writing SIMD code. That is my job. I am very eager to get my hands on a RVV chip so that I can play with a new SIMD ISA. So obviously a non-SIMD chip is useless to me.

> Why would you expect it to have vector extensions?

Because it has "Performance" in the name. I double-checked the ISA and saw that it did not have V, and was disappointed.

> If you don't need RISC-V this year then

Why would anybody _need_ RISC-V? RISC-V is exciting because it has the possibility of giving the end user higher performance per dollar. Until it does that, it will relegated to enthusiasts who just like playing with new ISAs.