Hacker News: watsonmusic

New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"

watsonmusic — Wed, 03 Sep 2025 15:04:47 +0000

it's not oss

New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"

watsonmusic — Wed, 03 Sep 2025 15:04:27 +0000

bonus usage

New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"

watsonmusic — Wed, 03 Sep 2025 14:58:36 +0000

11labs is facing a real competitor

New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"

watsonmusic — Wed, 03 Sep 2025 14:57:34 +0000

genius

New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"

watsonmusic — Wed, 03 Sep 2025 14:56:50 +0000

this model is superb

New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"

watsonmusic — Wed, 03 Sep 2025 14:56:08 +0000

Microsoft is cool

New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"

watsonmusic — Wed, 03 Sep 2025 14:53:53 +0000

yes the best

New comment by watsonmusic in "VibeVoice: A Frontier Open-Source Text-to-Speech Model"

watsonmusic — Wed, 03 Sep 2025 14:51:54 +0000

one of the best models built by Microsoft

New comment by watsonmusic in "Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio"

watsonmusic — Tue, 26 Aug 2025 13:25:46 +0000

https://github.com/microsoft/VibeVoice

New comment by watsonmusic in "Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio"

watsonmusic — Tue, 26 Aug 2025 13:25:35 +0000

https://huggingface.co/microsoft/VibeVoice-1.5B

New comment by watsonmusic in "Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio"

watsonmusic — Tue, 26 Aug 2025 13:24:42 +0000

VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking. A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details. The model can synthesize speech up to 90 minutes long with up to 4 distinct speakers, surpassing the typical 1-2 speaker limits of many prior models.

Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio

watsonmusic — Tue, 26 Aug 2025 13:24:42 +0000

Article URL: https://microsoft.github.io/VibeVoice/

Comments URL: https://news.ycombinator.com/item?id=45026218

Points: 3

# Comments: 3

New comment by watsonmusic in "Reinforcement Pre-Training"

watsonmusic — Tue, 10 Jun 2025 16:39:25 +0000

cannot wait seeing how it goes beyond the current llm training pipeline

New comment by watsonmusic in "Reinforcement Pre-Training"

watsonmusic — Tue, 10 Jun 2025 16:37:41 +0000

it could be adaptive. only high-value tokens were allocated with more compute

New comment by watsonmusic in "Reinforcement Pre-Training"

watsonmusic — Tue, 10 Jun 2025 16:34:49 +0000

A new scaling paradigm finally comes out!

New comment by watsonmusic in "Reinforcement Pre-Training"

watsonmusic — Tue, 10 Jun 2025 16:33:04 +0000

14b model performs comparably with 32b size. the improvement is huge

New comment by watsonmusic in "Differential Transformer"

watsonmusic — Tue, 08 Oct 2024 17:37:10 +0000

negative values can enhance the expressibility

New comment by watsonmusic in "Differential Transformer"

watsonmusic — Tue, 08 Oct 2024 13:56:48 +0000

not all hallucinations are creativity Imaginate that for a RAG application, the model is supposed to follow the given documents

New comment by watsonmusic in "Differential Transformer"

watsonmusic — Tue, 08 Oct 2024 13:05:02 +0000

the model is supposed to learn this

New comment by watsonmusic in "Differential Transformer"

watsonmusic — Tue, 08 Oct 2024 13:02:48 +0000

The modification is simple and beautiful. And the improvements are quite significant.