New comment by GreyOcten in "Show HN: A GPU/VRAM filter for finding LLMs that will run on your hardware"

GreyOcten — Fri, 26 Jun 2026 20:30:42 +0000

handy, but the gap most of these filters have is that "fits in VRAM" doesn't mean usable. context length blows up the KV cache fast, a 7B that fits at 2k tokens will OOM at 32k. factoring context len + quant into the estimate is where it'd actually save people from getting burned.

Hacker News: GreyOcten

New comment by GreyOcten in "Show HN: A GPU/VRAM filter for finding LLMs that will run on your hardware"