Hacker News: gajjanag

New comment by gajjanag in "TurboQuant: A first-principles walkthrough"

gajjanag — Mon, 27 Apr 2026 14:19:02 +0000

> Thanks for that! It is worth noting that taking advantage of the post-rotation distribution

I again feel this claim is too strong. Rotations have been used in information theory/wireless communications for decades at this point, with appropriate scaling done at channel inputs/outputs to hit channel capacity. The signals then pass through the appropriate codebooks that take advantage of the post-rotated+whitened signal.

Our cellphones today are powered by such technology.

I agree with your claim when restricted to deep learning. But I do not agree with the broad characterization that taking advantage of post-rotation distributions was only first done in your work.

New comment by gajjanag in "TurboQuant: A first-principles walkthrough"

gajjanag — Mon, 27 Apr 2026 13:59:27 +0000

Wow, yes - you are completely correct (read through the note in detail now).

Though, as your paper also notes, the quantizer values themselves aren't fundamentally novel to either paper. Lloyd Max scalar quantizers have been studied for a very, very long time. And the specific Lloyd Max values for the Gaussian input distribution have been obtained in many papers across signal processing and information theory.

New comment by gajjanag in "TurboQuant: A first-principles walkthrough"

gajjanag — Mon, 27 Apr 2026 11:17:29 +0000

There are also more papers on similar themes.

For example, TurboQuant makes use of QJL (quantized Johnson Lindenstrauss transformations). One of the first papers to characterize the QJL and in fact the rate distortion tradeoff for quantized matrix multiplication in general is "Optimal Quantization for Matrix Multiplication" (https://arxiv.org/abs/2410.13780) by Ordentlich and Polyanskiy.

There is also a more accessible survey paper around quantized matrix multiplication called "High-Rate Quantized Matrix Multiplication: Theory and Practice" (https://arxiv.org/abs/2601.17187), by the same authors.

TurboQuant cites none of them.

New comment by gajjanag in "The RAM shortage could last years"

gajjanag — Sun, 19 Apr 2026 12:55:21 +0000

TurboQuant is known across the industry to not be state of the art. There are superior schemes for KV quant at every bitrate. Eg, SpectralQuant: https://github.com/Dynamis-Labs/spectralquant among many, many papers.

> Given that TurboQuant results in a 6x reduction in memory usage for KV caches

All depends on baseline. The "6x" is by stylistic comparison to a BF16 KV cache; not a state of the art 8 or 4 bit KV cache scheme.

New comment by gajjanag in "The GNU libc atanh is correctly rounded"

gajjanag — Sat, 18 Apr 2026 11:41:39 +0000

The bigger challenge is GPU/NPU. Branches for fast vs accurate path get costlier, among other things. On CPU this is less of a cost.

Most published libm on GPU/NPU side have a few ULP of error for the perf vs accuracy tradeoff. Eg, documented explicitly in the CUDA programming guide: https://docs.nvidia.com/cuda/cuda-programming-guide/05-appen... .

Prof. Zimmermann and collaborators have a great table at https://members.loria.fr/PZimmermann/papers/accuracy.pdf (Feb 2026) comparing various libm wrt accuracy.

New comment by gajjanag in "Don't become an engineering manager"

gajjanag — Wed, 04 Mar 2026 01:23:08 +0000

> That's why you need to put your scope

The problem is, "scope" is often equated to "how many people worked in my empire" rather than "how much business value did my work X generate".

The two things are vastly different, and I have seen the distinction/oversimplification play out over and over in my own career as well as many others around me.

As an extreme on the "individual technical expert side", there are things out there that can pretty much only be accomplished with a few people around the world who possess the dedicated expertise. These results can't be replicated by a cobbled together team of 10 or 100 people even though the latter sounds more impressive for "scope".

Some organizations do a decent job of recognizing these different "archetypes", many don't.

New comment by gajjanag in "The state of SIMD in Rust in 2025"

gajjanag — Thu, 06 Nov 2025 14:38:57 +0000

>80%-90% or so of real life vectorization can be achieved in C or C++ just by writing code in a way that it can be autovectorized.

Yep. I was pleasantly surprised by the autovectorization quality with recent clang at work a few days ago. If you write code that the compiler can infer to be multiples of 4, 8, etc the compiler goes off and emits pretty decent NEON/AVX code. The rest as you say is handled quite well by intrinsics these days.

Autovectorization was definitely poorer 5-10 years ago on older compiler toolchains.

New comment by gajjanag in "Advice for new principal tech ICs (i.e., notes to myself)"

gajjanag — Sat, 25 Oct 2025 22:30:05 +0000

Welcome to the brave new world these days:

1 - Very few people conduct "proper scholarship", and fail to trace ideas back to their original inception and cite them correctly. This happens time and again in deep learning, where 30+ year old ideas are claimed as "novel" over and over. Many times out of malice by the authors, sometimes out of ignorance.

2 - Peer review in many parts of the industry+research is a joke. Mostly shouldered by early graduate students who don't really know the field well and an incredibly noisy process.

3 - It is common practice now to dump out one's "kitchen sink" of ideas rather than properly refined stuff. Hence the increase in LinkedIn spam, blog spam, arXiv spam style of papers.

New comment by gajjanag in "In Defense of C++"

gajjanag — Wed, 17 Sep 2025 15:38:48 +0000

> I don't think there are many (or any) upsides to the well documented downsides.

C++ template metaprogramming still remains extremely powerful. Projects like CUTLASS, etc could not be written to give best performance in as ergonomic a way in Rust.

There is a reason why the ML infra community mostly goes with Python-like DSL's, or template metaprogramming frameworks.

Last I checked there are no alternatives at scale for this.

New comment by gajjanag in "Defeating Nondeterminism in LLM Inference"

gajjanag — Thu, 11 Sep 2025 06:01:51 +0000

As others have pointed out, these phenomena are well known to many folks across companies in the AI infra space. It doesn't really break new ground. This article is a good exposition of the basic strategies though.

What I would have loved is a discussion around collectives/multi-node setups. And showing how to get determinism at low performance penalty for multi-node reduction collectives.

New comment by gajjanag in "SF may soon ban natural gas in homes and businesses undergoing major renovations"

gajjanag — Sun, 27 Jul 2025 17:28:17 +0000

I guess you have never worked with a slow induction cooktop. Literally we had to spend 15 minutes more for cooking things on induction compared with our previous apartment's gas connection.

Maybe they are better now but it is certainly not the case that all induction cooktops have these magical properties; many are cheap and skimp on something. While in the 5+ apartments I have been in gas has always delivered the same heating experience that I can rely on.

And to your point about rotis, no - it can not be done unless you get a different, heavier bottomed pan suitable for induction. Exactly what I was saying regarding the replacement costs.

New comment by gajjanag in "SF may soon ban natural gas in homes and businesses undergoing major renovations"

gajjanag — Sun, 27 Jul 2025 15:45:18 +0000

+1 - there are just so many Asian recipes that can not be done anywhere near as easily on induction stovetops (high heat from direct flame for flatbreads, etc).

Plus a whole bunch of cookware doesn't work with induction (clay pots, non ferromagnetic bases, etc). I do wonder if any of these "environmental" estimates factor in the environmental cost of replacing a bunch of cookware just to satisfy induction requirements.

New comment by gajjanag in "Too Many Open Files"

gajjanag — Sat, 07 Jun 2025 14:58:45 +0000

There is a vast number of sysctl in xnu that have not really been re-examined in over 15 years. Many tunings date back to the spinning rust drive era (for example). There are plenty of examples like this.

Disclaimer: I worked at Apple and poked xnu a bit.

New comment by gajjanag in "Developers, don't despair, big tech and AI hype is off the rails again"

gajjanag — Wed, 14 May 2025 13:42:20 +0000

The big problem is a bunch of folks actually take these things seriously and use it as an excuse to freeze the junior hiring pipeline.

At the senior levels this is not actually believed by the powers that be, since a bunch of hiring is still happening to compensate for overdone layoffs in spots, etc.

New comment by gajjanag in "Career Development: What It Means to Be a Manager, Director, or VP (2015)"

gajjanag — Fri, 21 Mar 2025 15:24:27 +0000

> Large corporations believe anyone is replaceable.

This is definitely true. By design, large corporations are structured so that there is no single point of failure.

> Again I am an IC & don’t see/hear any extra work done for retention.

Even in large corporations, extra work definitely happens for retention (I have experienced it myself as an IC). Even though everyone is by design replaceable, the organization has some incentive to work on retention:

a) Bad retention hurts the organization's reputation and future hiring (horror stories spread very fast)

b) Within the team, losing a great teammate hurts morale and output and managers know it will result in a hit on their metrics at least for the next half.

c) Managers may not always be able to backfill, and losing an employee can reduce the size of their "empire" that they are often trying so hard to establish at whatever cost.

New comment by gajjanag in "Apple's Software Quality Crisis"

gajjanag — Tue, 04 Mar 2025 03:24:57 +0000

Same. The compensation is substantially better at FAANG, but in terms of actual on the ground work being rewarded, almost never the case.

Meta-work (lots of "cross functional" documents, alignment meetings, sync ups with senior tech leads to brown nose, deliberately creating low quality output to justify hiring more people/growing one's "scope") is 90% of it.

Any actual output is largely accidental, coming from the 20% still naive, or idealistic enough to actually care about what they produce.

New comment by gajjanag in "An analysis of DeepSeek's R1-Zero and R1"

gajjanag — Wed, 29 Jan 2025 20:12:05 +0000

This is much more nuanced now. See Apple "Private Cloud Compute": https://security.apple.com/blog/private-cloud-compute/ ; they run a lot of the larger models on their own servers.

Fundamentally it is more efficient to process a batch of tokens from multiple users/requests than processing them from a single user's request on device.

New comment by gajjanag in ""Nvidia is so far ahead that all the 4090s are nerfed to half speed""

gajjanag — Tue, 17 Dec 2024 15:58:45 +0000

Maybe on a particular model/dataset but extremely unlikely in general. Again, like another commenter pointed out: if you truly believe it isn't that hard we would love to hire you at Meta ;)

New comment by gajjanag in "Building Meta's GenAI infrastructure"

gajjanag — Wed, 13 Mar 2024 00:06:46 +0000

Our group works on some of this stuff at Meta, and we have a pretty good diversity of backgrounds - high performance computing (the bulk), computer systems, compilers, ML engineers, etc. We are hiring.

Feel free to DM me to learn more.

New comment by gajjanag in "LLM in a Flash: Efficient LLM Inference with Limited Memory"

gajjanag — Thu, 21 Dec 2023 01:20:38 +0000

lmkd (low memory killer daemon) works fairly differently off of a different set of signals and different policy. But yes, conceptually they try to achieve the same goal.

I also do not know if Android combines system libraries into one big file for the savings, something Apple devices do.