Hacker News: martinloretz

New comment by martinloretz in "ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models(2023)"

martinloretz — Thu, 20 Feb 2025 14:27:03 +0000

I think this paper is the key to the next speedup in local LLM inference. By making the model sparse (using the ReLU activation), we can save around 80% of memory accesses and computations of the Feed Forward Layers. ReLU sets the output of a layer to 0 when it's negative, and since any number multiplied by zero is zero, the next layer doesn't need to load the rows of the weight matrix that would be zero after the multiplication.

Unfortunately there aren't a lot of models currently trained with ReLU activation.

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models(2023)

martinloretz — Thu, 20 Feb 2025 14:05:59 +0000

Article URL: https://arxiv.org/abs/2310.04564

Comments URL: https://news.ycombinator.com/item?id=43114770

Points: 2

# Comments: 1

New comment by martinloretz in "Show HN: A GPU-accelerated binary vector index"

martinloretz — Tue, 18 Feb 2025 22:48:17 +0000

Great work. Can you elaborate on how the radix selection works and how to get that working with float's and inner product distance? I just quickly checked the code, I'm not familiar with radix selection, but really interested in making extremely fast GPU indices.

A three body problem simulator

martinloretz — Fri, 10 Jan 2025 14:39:05 +0000

Article URL: https://three-body-problem.martinloretz.com/

Comments URL: https://news.ycombinator.com/item?id=42655911

Points: 2

# Comments: 0