Hacker News: refibrillator

New comment by refibrillator in "The hidden engineering of runways"

refibrillator — Mon, 26 Jan 2026 23:24:25 +0000

Hmm in distributed computer systems similar patterns exist, e.g. adding jitter to avoid thundering herd effects.

This feels like an essential pattern of the universe or something…

New comment by refibrillator in "The Most Popular Blogs of Hacker News in 2025"

refibrillator — Sat, 03 Jan 2026 19:57:11 +0000

Sometimes I wonder if anyone else feels there is a halo effect around certain personalities on this site. When I see someone ending nearly every comment with a link to their blog or pet project, it gives me bad vibes, as if they have ulterior motives. Especially if a majority of their blog posts are content lifted from elsewhere with minimal additions. Perhaps this is just hustle culture, and YC alum status confers immunity from these types of criticisms. Perhaps my only wish is that other voices would bubble to the top in some of these threads.

In any case I’m truly grateful for this site as a whole, the good and the bad.

New comment by refibrillator in "Scientists unlock brain's natural clean-up system for new treatments for stroke"

refibrillator — Thu, 01 Jan 2026 05:42:26 +0000

Here’s a starting point:

https://pmc.ncbi.nlm.nih.gov/articles/PMC5241507/#B1

TLDR: NAC is a derivative of an amino acid called cysteine, as such it is a precursor for one of the most important antioxidants in the body and it can modulate key metabolic pathways associated with good health across a variety of organs, notably for decades it has been a universally successful antidote for acetaminophen (Tylenol) overdose, it’s available over the counter but NAC is not naturally found in foods, eating cysteine-rich foods like chicken turkey yogurt etc is the next best bet.

New comment by refibrillator in "France targets Australia-style social media ban for children next year"

refibrillator — Wed, 31 Dec 2025 21:38:26 +0000

No disrespect but paying to verify age feels absurd, let alone putting a private company in charge of what should be an essential function of the government.

How about when you turn 18 or whatever the government gives you a signed JWT that contains your DOB? Anyone who needs to verify your age can check that and simply validate the signature via a public key published by the government.

Simply grab a new JWT when you need it, to ensure privacy.

And sure, sprinkle in some laws that make it illegal to store or share JWTs for clearly fraudulent intents.

> the vast majority of kids don't easily have access to alcohol or cigarettes

This feels like it comes from an affluent perspective, where I grew up it was trivial to acquire these things and much worse, there will always be someone’s older brother etc who will do this for $20 because he’s got nothing to lose.

New comment by refibrillator in "Nvidia's $20B antitrust loophole"

refibrillator — Sat, 27 Dec 2025 20:31:14 +0000

H100 has 80 GB of HBM3. There’s only like 37 MB of SRAM on a single chip.

New comment by refibrillator in "Weight loss jabs: What happens when you stop taking them"

refibrillator — Sun, 21 Dec 2025 21:39:31 +0000

Fascinatingly, the body already has a mechanism for this: fasting. One of the many beneficial side effects is rapid mucosal atrophy, decreasing villus height and crypt depth.

You can find evidence of this in the literature, but it’s absurdly understudied, because big pharma would rather sell you a subscription to life.

Fortunately there are many good people in the world, especially in the field of medicine, who want to help their patients unconditionally. So there are glimmers of hope, like some of the top cardiologists in the world going against status quo and treating patients with fasting regimes instead of surgery.

New comment by refibrillator in "Kids Rarely Read Whole Books Anymore. Even in English Class"

refibrillator — Sun, 14 Dec 2025 00:44:35 +0000

This is hilarious, I don’t even want to know if it’s legit.

New comment by refibrillator in "When would you ever want bubblesort? (2023)"

refibrillator — Thu, 11 Dec 2025 06:11:23 +0000

Love anecdotes like this! But admittedly I feel a bit lost, so please forgive my ignorance when I ask: why does choosing a subset of k integers at random require deduplication? My naive intuition is that sampling without replacement can be done in linear time (hash table to track chosen elements?). I’m probably not understanding the problem formulation here.

New comment by refibrillator in "Google unkills JPEG XL?"

refibrillator — Tue, 02 Dec 2025 00:36:34 +0000

One of the cooler and lesser known features of JPEG XL is a mode to losslessly transcode from JPEG while achieving ~20% space reduction. It’s reversible too because the original entropy coded bitstream is untouched.

Notably GCP is rolling this out to their DICOM store API, so you get the space savings of JXL but can transcode on the fly for applications that need to be served JPEG.

Only know this because we have tens of PBs in their DICOM store and stand to save a substantial amount of $ on an absurdly large annual bill.

Native browser support is on our wishlist and our contacts indicate the chrome team will get there eventually.

New comment by refibrillator in "We're learning more about what Vitamin D does"

refibrillator — Sat, 29 Nov 2025 18:48:48 +0000

Yeah it’s pretty clearly a bot account, or at least someone who likes to copy paste from chatgpt to sound smart.

New comment by refibrillator in "Claude Advanced Tool Use"

refibrillator — Mon, 24 Nov 2025 22:03:56 +0000

> It works better!

> I strongly believe it is one of the best technologies for AI agents

Do you have any quantitative evidence to support this?

Sincere question. I feel it would add some much needed credibility in a space where many folks are abusing the hype wave and low key shilling their products with vibes instead of rigor.

New comment by refibrillator in "We bought the whole GPU, so we're damn well going to use the whole GPU"

refibrillator — Thu, 02 Oct 2025 17:47:20 +0000

Ha made me chuckle. For those wondering seriously about this, it’s not a viable optimization because weights are not readily compressible via JPEG/DCT, and there are a limited number of these units on the chip which bottlenecks throughout, meaning speed is dwarfed by simply reading uncompressed weights from HBM.

New comment by refibrillator in "We reverse-engineered Flash Attention 4"

refibrillator — Sun, 28 Sep 2025 00:47:40 +0000

Great exposition, loved the touch of humor. Please do the backward pass when it’s published.

As a fellow Tri Dao groupie and lucky duck who gets to build on Hopper/Blackwell clusters, I find it amazing how difficult it is becoming to write kernels that saturate GPU hardware.

When I squint, there appears to be a trend emerging across work like FA4, monolithic (mega) kernels, etc. Namely, a subversion of the classic CUDA programming model in the form of fine grained task based parallelism, managed entirely in “user space”.

Not exactly sure what’s ahead but I’m strapping in for a wild ride…

New comment by refibrillator in "Why LLMs Can't Write Q/Kdb+: Writing Code Right-to-Left"

refibrillator — Thu, 10 Jul 2025 03:46:09 +0000

Well “import torch” for example will resolve certain dynamically linked symbols, which must be done first before importing your own .so code that uses libtorch and pybind11. If not you will get a super fun to debug segfault, leaving you staring at gdb backtrace output while you ponder your career choice.

This is buried deep in the PyTorch docs and I don’t have the willpower to go find it right now, sorry.

New comment by refibrillator in "Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken"

refibrillator — Mon, 30 Jun 2025 17:49:55 +0000

Tokenization is typically done on CPU and is rarely (if ever) a bottleneck for training or inference.

GPU kernels typically dominate in terms of wall clock time, the only exception might be very small models.

Thus the latency of tokenization can essentially be “hidden”, by having the CPU prepare the next batch while the GPU finishes the current batch.

New comment by refibrillator in "Compiling LLMs into a MegaKernel: A path to low-latency inference"

refibrillator — Fri, 20 Jun 2025 01:33:07 +0000

Hi author(s), the on-GPU interpreter approach looks like a promising path forward, have you seen this strikingly similar concurrent work?

https://news.ycombinator.com/item?id=44111673

I find it curious that fundamentals of the CUDA programming model (eg kernel launches) are being subverted in favor of fine grained task based parallelism that ends up using the hardware more effectively. Makes me wonder if CUDA has been holding us back in some ways.

What are the chances we see your work land in PyTorch as an experimental backend?

Awesome stuff thanks for sharing.

P.S. minor typo, your first two paragraphs under part 1 are nearly identical.

New comment by refibrillator in "Tokasaurus: An LLM inference engine for high-throughput workloads"

refibrillator — Fri, 06 Jun 2025 02:05:46 +0000

The code has few comments but gotta love when you can tell someone was having fun!

https://github.com/ScalingIntelligence/tokasaurus/blob/65efb...

I’m honestly impressed that a pure python implementation can beat out vLLM and SGLang. Granted they lean on FlashInfer, and of course torch.compile has gotten incredibly powerful in the last few years. Though dynamic shapes have still been a huge thorn in my side, I’ll need to look closer at how they pulled it off…

New comment by refibrillator in "Why DeepSeek is cheap at scale but expensive to run locally"

refibrillator — Sun, 01 Jun 2025 20:59:47 +0000

> Unsloth Dynamic GGUF which, quality wise in real-world use performs very close to the original

How close are we talking?

I’m not calling you a liar OP, but in general I wish people perpetuating such broad claims would be more rigorous.

Unsloth does amazing work, however as far as I’m aware even they themselves do not publish head to head evals with the original unquantized models.

I have sympathy here because very few people and companies can afford to run the original models, let alone engineer rigorous evals.

However I felt compelled to comment because my experience does not match. For relatively simple usage the differences are hard to notice, but they become much more apparent in high complexity and long context tasks.

New comment by refibrillator in "Lossless LLM compression for efficient GPU inference via dynamic-length float"

refibrillator — Sat, 26 Apr 2025 02:48:47 +0000

Note to others reading along: in the last appendix page the OP paper reports DFloat11 reduces tokens/sec by ~2-3x for the Llama-3.1-8b and Qwen-2.5-14b/32b and Mistral-small-24b models (throughput penalty not reported for others).

Using DFloat11, tokens/sec was higher only when compared relative to running inference with some layers offloaded to CPU.

Classic comp sci tradeoff between space and speed, no free lunch, etc.

New comment by refibrillator in "The Llama 4 herd"

refibrillator — Sat, 05 Apr 2025 19:08:21 +0000

> the actual processing happens in 17B

This is a common misconception of how MoE models work. To be clear, 17B parameters are activated for each token generated.

In practice you will almost certainly be pulling the full 109B parameters though the CPU/GPU cache hierarchy to generate non-trivial output, or at least a significant fraction of that.