<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: nshm</title><link>https://news.ycombinator.com/user?id=nshm</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 28 Apr 2026 23:38:25 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=nshm" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by nshm in "The past was not that cute"]]></title><description><![CDATA[
<p>> Running a family was a brutal two-person job -- and the kids had to dive in to help out the second they could lift something heavier than a couple pounds.<p>Orphanes did struggle but most families were not just two person, families were big and supported by community.</p>
]]></description><pubDate>Sun, 07 Dec 2025 11:28:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=46180969</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=46180969</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46180969</guid></item><item><title><![CDATA[New comment by nshm in "Omnilingual ASR: Advancing automatic speech recognition for 1600 languages"]]></title><description><![CDATA[
<p>You can check whale sound recognition project <a href="https://arxiv.org/abs/2104.08614" rel="nofollow">https://arxiv.org/abs/2104.08614</a></p>
]]></description><pubDate>Tue, 11 Nov 2025 05:26:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=45884398</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=45884398</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45884398</guid></item><item><title><![CDATA[New comment by nshm in "Omnilingual ASR: Advancing automatic speech recognition for 1600 languages"]]></title><description><![CDATA[
<p>And moreover, you can not tune those models for practical applications. The model is originally trained on very clean data, so lower layers are also not very stable for diverse inputs. To finetune you have to update the whole model, not just upper layers.</p>
]]></description><pubDate>Tue, 11 Nov 2025 05:05:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=45884304</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=45884304</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45884304</guid></item><item><title><![CDATA[New comment by nshm in "Omnilingual ASR: Advancing automatic speech recognition for 1600 languages"]]></title><description><![CDATA[
<p>This model is actually expected to be bad for popular languages, just like previous MMS it is not accurate at all, it wins by supporting something rare well but never had good ASR accuracy even for Swedish etc. It is more a research thing than a real tool. Unlike Whisper.</p>
]]></description><pubDate>Tue, 11 Nov 2025 04:21:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=45884089</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=45884089</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45884089</guid></item><item><title><![CDATA[New comment by nshm in "Sesame CSM: A Conversational Speech Generation Model"]]></title><description><![CDATA[
<p>It is useless actually. Very slow and quality is suboptimal and it is just speech generation component. See discussion here:<p><a href="https://github.com/SesameAILabs/csm/issues/80" rel="nofollow">https://github.com/SesameAILabs/csm/issues/80</a></p>
]]></description><pubDate>Tue, 18 Mar 2025 15:33:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=43400664</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=43400664</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43400664</guid></item><item><title><![CDATA[New comment by nshm in "What happened to BERT and T5?"]]></title><description><![CDATA[
<p>No, there are mathematical reasons LLMs are better. They are trained with multiobjective loss (coding skills, translation skills, etc) so they understand the world much better than MLM. Original post discuss that but with more words and points than necessary.</p>
]]></description><pubDate>Fri, 19 Jul 2024 21:02:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=41011183</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=41011183</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41011183</guid></item><item><title><![CDATA[New comment by nshm in "Reasoning in Large Language Models: A Geometric Perspective"]]></title><description><![CDATA[
<p>It is actually pretty straightforward why those model "reason" or, to be more exact, can operate on a complex concepts. By processing huge amount of texts they build an internal representation where those concepts are represented as a simple nodes (neurons or groups). So they really distill knowledge. Alternatively you can think about it as a very good principal component analysis that can extract many important aspects. Or like a semantic graph built automatically.<p>Once knowledge is distilled you can build on top of it easily by merging concepts for example.<p>So no secret here.</p>
]]></description><pubDate>Sun, 07 Jul 2024 21:11:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=40900462</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=40900462</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40900462</guid></item><item><title><![CDATA[New comment by nshm in "ChatTTS-Best open source TTS Model"]]></title><description><![CDATA[
<p>There is also a glitch in "dialogue"</p>
]]></description><pubDate>Wed, 29 May 2024 06:06:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=40508964</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=40508964</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40508964</guid></item><item><title><![CDATA[New comment by nshm in "Sergey Brin on Gemini 1.5 Pro (03/02/2024) [video]"]]></title><description><![CDATA[
<p>Anyone except me thinks he doesn't look very healthy? Its strange he is kind of slow on the video where he enters the room. Maybe some biohacking.</p>
]]></description><pubDate>Mon, 04 Mar 2024 19:18:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=39594710</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=39594710</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39594710</guid></item><item><title><![CDATA[New comment by nshm in "BASE TTS: The largest text-to-speech model to-date"]]></title><description><![CDATA[
<p>Yes, it is one of the important aspects. In particular if you use TTS to create an audiobook or in a video production.</p>
]]></description><pubDate>Thu, 15 Feb 2024 21:53:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=39389554</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=39389554</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39389554</guid></item><item><title><![CDATA[New comment by nshm in "BASE TTS: The largest text-to-speech model to-date"]]></title><description><![CDATA[
<p>Err, I deeply respect Amazon TTS team but this paper and synthesis is..... You publish the paper in 2024 and include YourTTS in your baselines to look better. Come on! There is XTTS2 around!<p>Voice sounds robotic and plain. Most likely a lot of audiobooks in training data and less conversational speech. And dropping diffusion was not a great idea, voice is not crystal clear anymore, it is more like a telephony recording.</p>
]]></description><pubDate>Wed, 14 Feb 2024 21:58:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=39376121</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=39376121</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39376121</guid></item><item><title><![CDATA[New comment by nshm in "BASE TTS: The largest text-to-speech model to-date"]]></title><description><![CDATA[
<p>Metavoice is one of a dozen GPT-based TTS systems around starting from Tortoise. And not that great honestly. You can clearly hear "glass scratches" in their sound, it is because they trained on MP3-compressed data.<p>There are much more clear sounding systems around. You can listen for StyleTTS2 to compare.</p>
]]></description><pubDate>Wed, 14 Feb 2024 21:51:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=39376036</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=39376036</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39376036</guid></item><item><title><![CDATA[New comment by nshm in "OpenAI releases Whisper v3, new generation open source ASR model"]]></title><description><![CDATA[
<p>Good improvements for many languages, numbers here<p><a href="https://github.com/openai/whisper/blob/main/language-breakdown.svg">https://github.com/openai/whisper/blob/main/language-breakdo...</a></p>
]]></description><pubDate>Mon, 06 Nov 2023 19:06:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=38167225</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=38167225</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38167225</guid></item><item><title><![CDATA[New comment by nshm in "Goodbye, Node.js Buffer"]]></title><description><![CDATA[
<p>Ok, first we screwed buffers by making them globally tracked instead of just a piece of memory. Now its time to break all binary modules again.</p>
]]></description><pubDate>Tue, 24 Oct 2023 15:27:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=38000499</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=38000499</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38000499</guid></item><item><title><![CDATA[New comment by nshm in "1400 year old gold foil figures found in Norwegian pagan temple"]]></title><description><![CDATA[
<p>Ok, but the photos look very suspicious. 1400 year gold right from the ground shouldn't shine like that. Compare to the coins here for example<p><a href="https://www.smithsonianmag.com/smart-news/ancient-welsh-gold-coins-a-first-from-the-iron-age-declared-treasure-180982730/" rel="nofollow noreferrer">https://www.smithsonianmag.com/smart-news/ancient-welsh-gold...</a></p>
]]></description><pubDate>Sun, 08 Oct 2023 07:29:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=37808562</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=37808562</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37808562</guid></item><item><title><![CDATA[New comment by nshm in "LLaMa running at 5 tokens/second on a Pixel 6"]]></title><description><![CDATA[
<p>Great thanks a lot.<p>So we have numbers on PTB original perplexity 8.79 quantized 9.68, already 10% worse. And PPL reported per token I suppose? Because word PPL for PTB must be around 20, not less than 10.<p>Any numbers on more complex tasks then? like QA?</p>
]]></description><pubDate>Wed, 15 Mar 2023 18:30:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=35172630</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=35172630</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35172630</guid></item><item><title><![CDATA[New comment by nshm in "LLaMa running at 5 tokens/second on a Pixel 6"]]></title><description><![CDATA[
<p>Do you have the numbers? I suspect is is way worse. Original llama.cpp authors never measure any numbers as well.</p>
]]></description><pubDate>Wed, 15 Mar 2023 17:54:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=35172161</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=35172161</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35172161</guid></item><item><title><![CDATA[New comment by nshm in "LLaMa running at 5 tokens/second on a Pixel 6"]]></title><description><![CDATA[
<p>It is not really llama, it is llama quantized to 4bit. Not even the quality of original 7B. I could also quantize it to 1 bit and claim it runs on my RPI3.</p>
]]></description><pubDate>Wed, 15 Mar 2023 17:50:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=35172088</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=35172088</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35172088</guid></item><item><title><![CDATA[New comment by nshm in "Tell HN: Please let me just buy stuff without having to “Contact Sales”"]]></title><description><![CDATA[
<p>In such an actively developed area like TTS/ASR there is high chance that custom solution would fit your needs much better. The feature set of TTS is actually pretty large and hard to combine in a single ML model. No free lunch you know.<p>For example if you look for singing voice, they might suggest you an adapted model that is good specifically for singing.<p>The testing process is also not very straight, you need to understand what to test  and how to test properly. For example, some of their voices might be better for questions, some for news.<p>You'd better talk to them.</p>
]]></description><pubDate>Mon, 13 Feb 2023 22:43:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=34781984</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=34781984</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34781984</guid></item><item><title><![CDATA[New comment by nshm in "Show HN: Self-host Whisper As a Service with GUI and queueing"]]></title><description><![CDATA[
<p>Vosk<p><a href="https://alphacephei.com/vosk/lm" rel="nofollow">https://alphacephei.com/vosk/lm</a><p>You can restrict the vocabulary the way you like, for example, here is the chess app built with Vosk<p><a href="https://www.chessvis.com/" rel="nofollow">https://www.chessvis.com/</a></p>
]]></description><pubDate>Mon, 13 Feb 2023 20:11:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=34779645</link><dc:creator>nshm</dc:creator><comments>https://news.ycombinator.com/item?id=34779645</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34779645</guid></item></channel></rss>