New comment by yuweiloopy2 in "Gemma 3 QAT Models: Bringing AI to Consumer GPUs"

yuweiloopy2 — Mon, 21 Apr 2025 13:26:34 +0000

Been using the 27B QAT model for batch processing 50K+ internal documents. The 128K context is game-changing for our legal review pipeline. Though I wish the token generation was faster - at 20tps it's still too slow for interactive use compared to Claude Opus.

Hacker News: yuweiloopy2

New comment by yuweiloopy2 in "Gemma 3 QAT Models: Bringing AI to Consumer GPUs"