New comment by baibai008989 in "Show HN: Three new Kitten TTS models – smallest less than 25MB"

baibai008989 — Fri, 20 Mar 2026 11:06:13 +0000

the dependency chain issue is a real barrier for edge deployment. i've been running tts models on a raspberry pi for a home automation project and anything that pulls torch + cuda makes the whole thing a non-starter. 25MB is genuinely exciting for that use case.

curious about the latency characteristics though. 1.5x realtime on a 9700 is fine for batch processing but for interactive use you need first-chunk latency under 200ms or the conversation feels broken. does anyone know if it supports streaming output or is it full-utterance only?

the phoneme-based approach should help with pronunciation consistency too. the models i've tried that work on raw text tend to mispronounce technical terms unpredictably — same word pronounced differently across runs.

Hacker News: baibai008989

New comment by baibai008989 in "Show HN: Three new Kitten TTS models – smallest less than 25MB"