<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ZDisket</title><link>https://news.ycombinator.com/user?id=ZDisket</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 13 Apr 2026 13:38:56 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ZDisket" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ZDisket in "Show HN: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)"]]></title><description><![CDATA[
<p>Yes. Specifically, the pipeline is text -> phonemizer -> phonemized text -> TTS model -> audio
You just have to modify the phonemizer's dictionary.</p>
]]></description><pubDate>Thu, 19 Mar 2026 22:14:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47447076</link><dc:creator>ZDisket</dc:creator><comments>https://news.ycombinator.com/item?id=47447076</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47447076</guid></item><item><title><![CDATA[New comment by ZDisket in "Show HN: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)"]]></title><description><![CDATA[
<p>No multilingual capabilities yet, although that is planned for next iteration.</p>
]]></description><pubDate>Thu, 19 Mar 2026 22:13:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=47447051</link><dc:creator>ZDisket</dc:creator><comments>https://news.ycombinator.com/item?id=47447051</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47447051</guid></item><item><title><![CDATA[Show HN: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)]]></title><description><![CDATA[
<p>Hi guys and gals, I made a TTS model based on my highly upgraded VITS base, conditioned on external speaker embeddings (Resemble AI's Resemblyzer).<p>The model, with ~31M parameters (ONNX), is tuned for latency and local inference, and comes already exported. I was trying to push the limits of what I could do with small, fast models. Runs 5.6x realtime on a server CPU<p>It supports voice cloning, voice blending (mix two or more speakers to make a new voice), the license is Apache 2.0 and it uses DeepPhonemizer (MIT) for the phonemization, so no license issues.<p>The repo contains the checkpoint, how to run it, and links to Colab and HuggingFace demos.<p>Now, because it's tiny, audio quality isn't the best, and as it was trained on LibriTTS-R + VCTK (both fully open datasets), speaker similarity isn't as good.<p>Regardless, I hope it's useful.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47431263">https://news.ycombinator.com/item?id=47431263</a></p>
<p>Points: 4</p>
<p># Comments: 4</p>
]]></description><pubDate>Wed, 18 Mar 2026 20:48:54 +0000</pubDate><link>https://github.com/ZDisket/vits-evo</link><dc:creator>ZDisket</dc:creator><comments>https://news.ycombinator.com/item?id=47431263</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47431263</guid></item><item><title><![CDATA[New comment by ZDisket in "Ask HN: What Are You Working On? (March 2026)"]]></title><description><![CDATA[
<p>I'll explain in detail once I've got the big release, but everything's been thoroughly modernized. Transformer, HiFi-GAN (now iSTFTNet w/Snake) vocoder, et al, plus a few additions.</p>
]]></description><pubDate>Mon, 09 Mar 2026 16:45:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=47311520</link><dc:creator>ZDisket</dc:creator><comments>https://news.ycombinator.com/item?id=47311520</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47311520</guid></item><item><title><![CDATA[New comment by ZDisket in "Ask HN: What Are You Working On? (March 2026)"]]></title><description><![CDATA[
<p>Multilingual and local? Try out Supertonic 2.</p>
]]></description><pubDate>Mon, 09 Mar 2026 05:55:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47305314</link><dc:creator>ZDisket</dc:creator><comments>https://news.ycombinator.com/item?id=47305314</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47305314</guid></item><item><title><![CDATA[New comment by ZDisket in "Ask HN: What Are You Working On? (March 2026)"]]></title><description><![CDATA[
<p>I'm working on a voice cloning version of my TTS model, a highly upgraded VITS:<p><a href="https://x.com/ZDi____/status/2013655958027669958" rel="nofollow">https://x.com/ZDi____/status/2013655958027669958</a><p>Right now, I only have single speaker checkpoints (as per the old video). That will change soon.</p>
]]></description><pubDate>Mon, 09 Mar 2026 04:47:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47304970</link><dc:creator>ZDisket</dc:creator><comments>https://news.ycombinator.com/item?id=47304970</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47304970</guid></item><item><title><![CDATA[New comment by ZDisket in "Ask HN: Would you use a job board where every listing is verified?"]]></title><description><![CDATA[
<p>Upwork has candidates buy "connects" with real money that are spent when applying to jobs. Ultimately it seems some form of payment is a proven gate.</p>
]]></description><pubDate>Sun, 08 Mar 2026 01:31:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47293392</link><dc:creator>ZDisket</dc:creator><comments>https://news.ycombinator.com/item?id=47293392</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47293392</guid></item></channel></rss>