<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: jmvalin</title><link>https://news.ycombinator.com/user?id=jmvalin</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 03 May 2026 21:18:51 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=jmvalin" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by jmvalin in "Opus 1.5 released: Opus gets a machine learning upgrade"]]></title><description><![CDATA[
<p>Actually, what we're doing from DRED isn't that far from what you're suggesting. The difference is that we keep more information about the voice/intonation and we don't need the latency that would otherwise be added by an ASR. In the end, the output is still synthesized from higher-level, efficiently compressed information.</p>
]]></description><pubDate>Tue, 05 Mar 2024 18:02:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=39606999</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=39606999</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39606999</guid></item><item><title><![CDATA[New comment by jmvalin in "Opus 1.5 released: Opus gets a machine learning upgrade"]]></title><description><![CDATA[
<p>What the PLC does is (vaguely) equivalent to momentarily freezing the image rather than showing a blank screen when packets are lost. If you're in the middle of a vowel, it'll continue the vowel (trying to follow the right energy) for about 100 ms before fading out. It's explicitly designed not to make up anything you didn't say -- for obvious reasons.</p>
]]></description><pubDate>Tue, 05 Mar 2024 00:50:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=39598117</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=39598117</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39598117</guid></item><item><title><![CDATA[New comment by jmvalin in "Opus 1.5 released: Opus gets a machine learning upgrade"]]></title><description><![CDATA[
<p>Well, there's different ways to make things up. We decided against using a pure generative model to avoid making up phoneme or words. Instead, we predict the expected acoustic features (using a regression loss), which means that model is able to continue a vowel. If unsure it'll just pick the "middle point", which won't be something recognizable as a new word. That's in line with how traditional PLCs work. It just sounds better. The only generative part is the vocoder that reconstructs the waveform, but it's constrained to match the predicted spectrum so it can't hallucinate either.</p>
]]></description><pubDate>Mon, 04 Mar 2024 20:50:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=39595885</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=39595885</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39595885</guid></item><item><title><![CDATA[New comment by jmvalin in "Opus 1.5 released: Opus gets a machine learning upgrade"]]></title><description><![CDATA[
<p>Quoting from our paper, training was done using "205 hours of 16-kHz speech from a combination of TTS datasets including more than 900 speakers in 34 languages and dialects". Mostly tested with English, but part of the idea of releasing early (none of that is standardized) is for people to try it out and report any issues.<p>There's about equal male and female speakers, though codecs always have slight perceptual quality biases (in either direction) that depend on the pitch. Oh, and everything here is speech only.</p>
]]></description><pubDate>Mon, 04 Mar 2024 20:42:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=39595770</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=39595770</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39595770</guid></item><item><title><![CDATA[New comment by jmvalin in "Opus 1.5 released: Opus gets a machine learning upgrade"]]></title><description><![CDATA[
<p>As part of the packet loss challenge, there was an ASR word accuracy evaluation to see how PLC impacted intelligibility. See <a href="https://www.microsoft.com/en-us/research/academic-program/audio-deep-packet-loss-concealment-challenge-interspeech-2022/results/" rel="nofollow">https://www.microsoft.com/en-us/research/academic-program/au...</a><p>The good news is that we were able to improve intelligibility slightly compared with filling with zeros (it's also a lot less annoying to listen to). The bad news is that you can only do so much with PLC, which is why we then pursued the Deep Redundancy (DRED) idea.</p>
]]></description><pubDate>Mon, 04 Mar 2024 20:32:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=39595659</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=39595659</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39595659</guid></item><item><title><![CDATA[New comment by jmvalin in "Opus Databending Drumkit"]]></title><description><![CDATA[
<p>(Opus author here) I'm curious what kind of "glitch" this is referring to.</p>
]]></description><pubDate>Tue, 01 Aug 2023 20:40:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=36962500</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=36962500</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36962500</guid></item><item><title><![CDATA[New comment by jmvalin in "Freenode ops take control of 700 channels"]]></title><description><![CDATA[
<p>> This is entirely my fault, and I take all the blame for that.<p>You shouldn't be blaming yourself, it was the best thing to do. Some people may have been confused over who the "good guys" were in this mess. By taking over all these channels you made everything perfectly clear. No amount of arguing could have made things clearer than your actions.</p>
]]></description><pubDate>Wed, 26 May 2021 07:53:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=27287624</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=27287624</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27287624</guid></item><item><title><![CDATA[New comment by jmvalin in "Show HN: Alsa_rnnoise – RNNoise-based noise removal plugin for ALSA"]]></title><description><![CDATA[
<p>No, exactly <i>none</i> of that data was used for training. The training was done before the demo that was asking for noise contributions. The contributions are CC0, but were never used (i.e. totally unknown dataset quality).</p>
]]></description><pubDate>Sun, 31 Jan 2021 20:38:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=25982345</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=25982345</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=25982345</guid></item><item><title><![CDATA[New comment by jmvalin in "Opus Audio Codec – FAQ"]]></title><description><![CDATA[
<p>All major browsers now implement WebRTC, including Opus support. Also, most browsers now support Opus playback in HTML5, though AFAIK Safari only supports it in the CAF container. See <a href="https://caniuse.com/#search=opus" rel="nofollow">https://caniuse.com/#search=opus</a></p>
]]></description><pubDate>Sun, 16 Aug 2020 06:08:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=24175629</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=24175629</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=24175629</guid></item><item><title><![CDATA[New comment by jmvalin in "A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet"]]></title><description><![CDATA[
<p>I didn't say "impossible", merely "not simple". The minute you bring in a GAN, things are already not simple. Also, I'm not aware of any work on a GAN that works with a network that does conditional sampling (like LPCNet/WaveNet), so it would mean starting from scratch.</p>
]]></description><pubDate>Sat, 30 Mar 2019 14:38:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=19529609</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=19529609</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=19529609</guid></item><item><title><![CDATA[New comment by jmvalin in "A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet"]]></title><description><![CDATA[
<p>In theory, it wouldn't be too hard to implement with an neural network. In theory. In practice, the problem is figuring out how to do the training because I don't have 2 hours of your voice saying the same thing as the target voice and with perfect alignment. I suspect it's still possible, but it's not a simple thing either.</p>
]]></description><pubDate>Sat, 30 Mar 2019 01:47:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=19527084</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=19527084</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=19527084</guid></item><item><title><![CDATA[New comment by jmvalin in "A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet"]]></title><description><![CDATA[
<p>The ceptrum that takes up most of the bits (or the LSPs in other codecs) is actually a model of the larynx -- another reason why it doesn't do well on music. Because of the accuracy needed to exactly represent the filter that the larynx makes, plus the fact that it can more relatively quickly, there's indeed a significant number of bits involved here.<p>The bitrate could definitely be reduced (possibly by 50%+) by using packets of 1 seconds along with entropy coding, but the resulting codec would not be very useful for voice communication. You want packets short enough to get decent latency and if you use RF, then VBR makes things a lot more complicated (and less robust).</p>
]]></description><pubDate>Sat, 30 Mar 2019 01:24:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=19526965</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=19526965</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=19526965</guid></item><item><title><![CDATA[New comment by jmvalin in "A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet"]]></title><description><![CDATA[
<p>Actually, what's in the demo already includes pruning (through sparse matrices) and indeed, it does keep just 1/10 of the weights as non-zero. In practice it's not quite a 10x speedup because the network has to be a bit bigger to get the same performance. It's still a pretty significant improvement. Of course, the weights are pruned by 16x1 blocks to avoid hurting vectorization (see the first LPCNet paper and the WaveRNN paper for details).</p>
]]></description><pubDate>Fri, 29 Mar 2019 22:53:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=19526328</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=19526328</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=19526328</guid></item><item><title><![CDATA[New comment by jmvalin in "A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet"]]></title><description><![CDATA[
<p>Iridium appears to be using a vocoder called AMBE. Its quality is similar to the one of the MELP codec from the demo and it also runs at 2.4 kb/s. LPCNet at 1.6 is a significant improvement over that -- if you can afford the complexity of course (at least it'll work on a phone now).</p>
]]></description><pubDate>Fri, 29 Mar 2019 21:38:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=19525901</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=19525901</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=19525901</guid></item><item><title><![CDATA[New comment by jmvalin in "A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet"]]></title><description><![CDATA[
<p>Well, in the case of music, what happens is that due to the low bit-rate there are many different signals that can produce the same features. The LPCNet model is trained to reproduce whatever is the most likely to be a single person speaking. The more advanced the model, the more speech-like the music is likely to turn<p>When it comes to noisy speech, it should be possible to improve things by actually training on noisy speech (the current model is trained only on clean speech). Stay tuned :-)</p>
]]></description><pubDate>Fri, 29 Mar 2019 21:24:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=19525811</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=19525811</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=19525811</guid></item><item><title><![CDATA[New comment by jmvalin in "A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet"]]></title><description><![CDATA[
<p>Keep in mind that the very first CELP speech codec (in 1984) used to take 90 seconds to encode just 1 second of speech... on a Cray supercomputer. Ten years later, people had that running in their cell phones. It's not just that hardware keeps getting faster, but algorithms are also getting more efficient. LPCNet is already 1/100 the complexity of the original WaveNet (which is just 2 years old) and I'm pretty sure it's still far from optimal.</p>
]]></description><pubDate>Fri, 29 Mar 2019 20:32:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=19525385</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=19525385</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=19525385</guid></item><item><title><![CDATA[New comment by jmvalin in "A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet"]]></title><description><![CDATA[
<p>Actually, this won't work <i>at all</i> for music because it makes fundamental assumptions that the signal is speech. For normal conversations, it should work, though for now the models are not yet as robust as I'd like (in case of noise and reverberation). That's next on the list of things to improve.</p>
]]></description><pubDate>Fri, 29 Mar 2019 20:21:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=19525277</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=19525277</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=19525277</guid></item><item><title><![CDATA[LPCNet: DSP-Boosted Neural Speech Synthesis]]></title><description><![CDATA[
<p>Article URL: <a href="https://people.xiph.org/~jm/demo/lpcnet/">https://people.xiph.org/~jm/demo/lpcnet/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=18495603">https://news.ycombinator.com/item?id=18495603</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 20 Nov 2018 17:27:59 +0000</pubDate><link>https://people.xiph.org/~jm/demo/lpcnet/</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=18495603</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=18495603</guid></item><item><title><![CDATA[New comment by jmvalin in "Opus 1.3 Released"]]></title><description><![CDATA[
<p>Like many other audio codecs, Opus lets the encoder decide how to spend the bits is has -- on what frame and on what frequency bands. On top of that is has a few special features that also require decisions from the encoder. So while the decoder doesn't change, the encoder can be improved to make better decisions. While the format itself is not perfect, I have not come across any particular thing that would be worth breaking compatibility over. I prefer working within the constraints of the bitstream to keep improving the quality.</p>
]]></description><pubDate>Fri, 19 Oct 2018 06:05:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=18254517</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=18254517</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=18254517</guid></item><item><title><![CDATA[New comment by jmvalin in "Opus 1.3 Released"]]></title><description><![CDATA[
<p>The reason we are not calling it Opus 2 is that it could confuse some people into thinking we broke compatibility. Opus 1.3 is perfectly compatible with Opus 1.0, and all future releases will keep that compatibility.</p>
]]></description><pubDate>Thu, 18 Oct 2018 22:31:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=18252771</link><dc:creator>jmvalin</dc:creator><comments>https://news.ycombinator.com/item?id=18252771</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=18252771</guid></item></channel></rss>