<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: watsonmusic</title><link>https://news.ycombinator.com/user?id=watsonmusic</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 29 Apr 2026 18:48:45 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=watsonmusic" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"]]></title><description><![CDATA[
<p>it's not oss</p>
]]></description><pubDate>Wed, 03 Sep 2025 15:04:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=45116659</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45116659</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45116659</guid></item><item><title><![CDATA[New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"]]></title><description><![CDATA[
<p>bonus usage</p>
]]></description><pubDate>Wed, 03 Sep 2025 15:04:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=45116655</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45116655</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45116655</guid></item><item><title><![CDATA[New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"]]></title><description><![CDATA[
<p>11labs is facing a real competitor</p>
]]></description><pubDate>Wed, 03 Sep 2025 14:58:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=45116592</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45116592</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45116592</guid></item><item><title><![CDATA[New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"]]></title><description><![CDATA[
<p>genius</p>
]]></description><pubDate>Wed, 03 Sep 2025 14:57:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=45116577</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45116577</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45116577</guid></item><item><title><![CDATA[New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"]]></title><description><![CDATA[
<p>this model is superb</p>
]]></description><pubDate>Wed, 03 Sep 2025 14:56:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=45116568</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45116568</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45116568</guid></item><item><title><![CDATA[New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"]]></title><description><![CDATA[
<p>Microsoft is cool</p>
]]></description><pubDate>Wed, 03 Sep 2025 14:56:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=45116560</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45116560</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45116560</guid></item><item><title><![CDATA[New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"]]></title><description><![CDATA[
<p>yes the best</p>
]]></description><pubDate>Wed, 03 Sep 2025 14:53:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=45116531</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45116531</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45116531</guid></item><item><title><![CDATA[New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"]]></title><description><![CDATA[
<p>one of the best models built by Microsoft</p>
]]></description><pubDate>Wed, 03 Sep 2025 14:51:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=45116509</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45116509</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45116509</guid></item><item><title><![CDATA[New comment by watsonmusic in "Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio"]]></title><description><![CDATA[
<p><a href="https://github.com/microsoft/VibeVoice" rel="nofollow">https://github.com/microsoft/VibeVoice</a></p>
]]></description><pubDate>Tue, 26 Aug 2025 13:25:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=45026227</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45026227</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45026227</guid></item><item><title><![CDATA[New comment by watsonmusic in "Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio"]]></title><description><![CDATA[
<p><a href="https://huggingface.co/microsoft/VibeVoice-1.5B" rel="nofollow">https://huggingface.co/microsoft/VibeVoice-1.5B</a></p>
]]></description><pubDate>Tue, 26 Aug 2025 13:25:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=45026226</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45026226</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45026226</guid></item><item><title><![CDATA[New comment by watsonmusic in "Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio"]]></title><description><![CDATA[
<p>VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking. A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details. The model can synthesize speech up to 90 minutes long with up to 4 distinct speakers, surpassing the typical 1-2 speaker limits of many prior models.</p>
]]></description><pubDate>Tue, 26 Aug 2025 13:24:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=45026219</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45026219</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45026219</guid></item><item><title><![CDATA[Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio]]></title><description><![CDATA[
<p>Article URL: <a href="https://microsoft.github.io/VibeVoice/">https://microsoft.github.io/VibeVoice/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45026218">https://news.ycombinator.com/item?id=45026218</a></p>
<p>Points: 3</p>
<p># Comments: 3</p>
]]></description><pubDate>Tue, 26 Aug 2025 13:24:42 +0000</pubDate><link>https://microsoft.github.io/VibeVoice/</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=45026218</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45026218</guid></item><item><title><![CDATA[New comment by watsonmusic in "Reinforcement Pre-Training"]]></title><description><![CDATA[
<p>cannot wait seeing how it goes beyond the current llm training pipeline</p>
]]></description><pubDate>Tue, 10 Jun 2025 16:39:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=44238819</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=44238819</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44238819</guid></item><item><title><![CDATA[New comment by watsonmusic in "Reinforcement Pre-Training"]]></title><description><![CDATA[
<p>it could be adaptive. only high-value tokens were allocated with more compute</p>
]]></description><pubDate>Tue, 10 Jun 2025 16:37:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=44238797</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=44238797</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44238797</guid></item><item><title><![CDATA[New comment by watsonmusic in "Reinforcement Pre-Training"]]></title><description><![CDATA[
<p>A new scaling paradigm finally comes out!</p>
]]></description><pubDate>Tue, 10 Jun 2025 16:34:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=44238774</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=44238774</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44238774</guid></item><item><title><![CDATA[New comment by watsonmusic in "Reinforcement Pre-Training"]]></title><description><![CDATA[
<p>14b model performs comparably with 32b size. the improvement is huge</p>
]]></description><pubDate>Tue, 10 Jun 2025 16:33:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=44238758</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=44238758</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44238758</guid></item><item><title><![CDATA[New comment by watsonmusic in "Differential Transformer"]]></title><description><![CDATA[
<p>negative values can enhance the expressibility</p>
]]></description><pubDate>Tue, 08 Oct 2024 17:37:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=41779779</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=41779779</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41779779</guid></item><item><title><![CDATA[New comment by watsonmusic in "Differential Transformer"]]></title><description><![CDATA[
<p>not all hallucinations are creativity
Imaginate that for a RAG application, the model is supposed to follow the given documents</p>
]]></description><pubDate>Tue, 08 Oct 2024 13:56:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=41777437</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=41777437</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41777437</guid></item><item><title><![CDATA[New comment by watsonmusic in "Differential Transformer"]]></title><description><![CDATA[
<p>the model is supposed to learn this</p>
]]></description><pubDate>Tue, 08 Oct 2024 13:05:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=41776928</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=41776928</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41776928</guid></item><item><title><![CDATA[New comment by watsonmusic in "Differential Transformer"]]></title><description><![CDATA[
<p>The modification is simple and beautiful. And the improvements are quite significant.</p>
]]></description><pubDate>Tue, 08 Oct 2024 13:02:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=41776910</link><dc:creator>watsonmusic</dc:creator><comments>https://news.ycombinator.com/item?id=41776910</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41776910</guid></item></channel></rss>