Hacker News: ZDisket

New comment by ZDisket in "Show HN: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)"

ZDisket — Thu, 19 Mar 2026 22:14:22 +0000

Yes. Specifically, the pipeline is text -> phonemizer -> phonemized text -> TTS model -> audio You just have to modify the phonemizer's dictionary.

New comment by ZDisket in "Show HN: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)"

ZDisket — Thu, 19 Mar 2026 22:13:05 +0000

No multilingual capabilities yet, although that is planned for next iteration.

Show HN: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)

ZDisket — Wed, 18 Mar 2026 20:48:54 +0000

Hi guys and gals, I made a TTS model based on my highly upgraded VITS base, conditioned on external speaker embeddings (Resemble AI's Resemblyzer).

The model, with ~31M parameters (ONNX), is tuned for latency and local inference, and comes already exported. I was trying to push the limits of what I could do with small, fast models. Runs 5.6x realtime on a server CPU

It supports voice cloning, voice blending (mix two or more speakers to make a new voice), the license is Apache 2.0 and it uses DeepPhonemizer (MIT) for the phonemization, so no license issues.

The repo contains the checkpoint, how to run it, and links to Colab and HuggingFace demos.

Now, because it's tiny, audio quality isn't the best, and as it was trained on LibriTTS-R + VCTK (both fully open datasets), speaker similarity isn't as good.

Regardless, I hope it's useful.

Comments URL: https://news.ycombinator.com/item?id=47431263

Points: 4

# Comments: 4

New comment by ZDisket in "Ask HN: What Are You Working On? (March 2026)"

ZDisket — Mon, 09 Mar 2026 16:45:21 +0000

I'll explain in detail once I've got the big release, but everything's been thoroughly modernized. Transformer, HiFi-GAN (now iSTFTNet w/Snake) vocoder, et al, plus a few additions.

New comment by ZDisket in "Ask HN: What Are You Working On? (March 2026)"

ZDisket — Mon, 09 Mar 2026 05:55:09 +0000

Multilingual and local? Try out Supertonic 2.

New comment by ZDisket in "Ask HN: What Are You Working On? (March 2026)"

ZDisket — Mon, 09 Mar 2026 04:47:41 +0000

I'm working on a voice cloning version of my TTS model, a highly upgraded VITS:

https://x.com/ZDi____/status/2013655958027669958

Right now, I only have single speaker checkpoints (as per the old video). That will change soon.

New comment by ZDisket in "Ask HN: Would you use a job board where every listing is verified?"

ZDisket — Sun, 08 Mar 2026 01:31:45 +0000

Upwork has candidates buy "connects" with real money that are spent when applying to jobs. Ultimately it seems some form of payment is a proven gate.