<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: cootsnuck</title><link>https://news.ycombinator.com/user?id=cootsnuck</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 15 Apr 2026 02:19:12 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=cootsnuck" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by cootsnuck in "Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS"]]></title><description><![CDATA[
<p>Yup, you're absolutely right. The open source models do have their rough edges. I use NVIDIA's Parakeet v3 model a lot locally, and it will occasionally do this thing where it just repeats a word like a dozen times.</p>
]]></description><pubDate>Tue, 07 Apr 2026 16:29:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47677812</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47677812</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47677812</guid></item><item><title><![CDATA[New comment by cootsnuck in "Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS"]]></title><description><![CDATA[
<p>Handy has Windows support. <a href="https://handy.computer/" rel="nofollow">https://handy.computer/</a></p>
]]></description><pubDate>Mon, 06 Apr 2026 23:50:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47668947</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47668947</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47668947</guid></item><item><title><![CDATA[New comment by cootsnuck in "Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS"]]></title><description><![CDATA[
<p>Yup, Handy is the one that made me stop looking for local open source alternatives to Wispr Flow.<p>I'll give a shoutout as well to Glimpse: <a href="https://github.com/LegendarySpy/Glimpse" rel="nofollow">https://github.com/LegendarySpy/Glimpse</a></p>
]]></description><pubDate>Mon, 06 Apr 2026 23:48:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=47668925</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47668925</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47668925</guid></item><item><title><![CDATA[New comment by cootsnuck in "Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS"]]></title><description><![CDATA[
<p>Interesting. My Pixel 7 transcription is barely usable for me. Makes way too many mistakes and defeats the purpose of me not having to type, but maybe that's just my experience.<p>The latest open source local STT models people are running on devices are significantly more robust (e.g. whisper models, parakeet models, etc.). So background noise, mumbling, and/or just not having a perfect audio environment doesn't trip up the SoTA models <i>as much</i> (all of them still do get tripped up).<p>I work in voice AI and am using these models (both proprietary and local open source) every day. Night and day different for me.</p>
]]></description><pubDate>Mon, 06 Apr 2026 23:45:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47668903</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47668903</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47668903</guid></item><item><title><![CDATA[AI's Elephant in the Room]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.insidevoice.ai/p/ais-elephant-in-the-room">https://www.insidevoice.ai/p/ais-elephant-in-the-room</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47478985">https://news.ycombinator.com/item?id=47478985</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Sun, 22 Mar 2026 16:11:37 +0000</pubDate><link>https://www.insidevoice.ai/p/ais-elephant-in-the-room</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47478985</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47478985</guid></item><item><title><![CDATA[New comment by cootsnuck in "Tinybox – A powerful computer for deep learning"]]></title><description><![CDATA[
<p>How many businesses have the capabilities and expertise to train their own models?</p>
]]></description><pubDate>Sun, 22 Mar 2026 01:35:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=47473528</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47473528</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47473528</guid></item><item><title><![CDATA[New comment by cootsnuck in "Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs"]]></title><description><![CDATA[
<p>Super cool. Love seeing these writeups of hobbyists getting their hands dirty, breaking things, and then coming out on the other side of it with something interesting.</p>
]]></description><pubDate>Tue, 10 Mar 2026 15:50:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47324897</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47324897</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47324897</guid></item><item><title><![CDATA[New comment by cootsnuck in "Show HN: DenchClaw – Local CRM on Top of OpenClaw"]]></title><description><![CDATA[
<p>Yea, it has been a little shocking to me that the rising narratives around "AI agents everywhere" and "enable the web for AI agents" requires what we've all been wanting for awhile on the web (openness and interoperability) but that the same big players in tech have been clearly against for a long time. Like the fact that Google recently released that Google Workspace CLI (<a href="https://github.com/googleworkspace/cli" rel="nofollow">https://github.com/googleworkspace/cli</a>) is a perfect example.<p>They could've released something like that years ago (the discovery service it's built on has existed for over a decade) but creating a simple, accessible, unified CLI for general integration apparently wasn't worth it until agents became the hot thing.<p>I wonder when / if there will be a rug pull on all of this. Because I really don't see what the long-term incentives are for incumbent tech platforms to make it easy for automated systems to essentially pull users away from the actual platform. I guess they're focused on the short term incentives. And once they decide the party's over, promising upstarts and competition can get absorbed and it'll be business as usual. Idk, we'll see.</p>
]]></description><pubDate>Mon, 09 Mar 2026 19:58:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47314596</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47314596</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47314596</guid></item><item><title><![CDATA[New comment by cootsnuck in "Show HN: I built a sub-500ms latency voice agent from scratch"]]></title><description><![CDATA[
<p>In OpenAI's own words about semantic_vad:<p>> Chunks the audio when the model believes based on the words said by the user that they have completed their utterance.<p>Source: <a href="https://developers.openai.com/api/docs/guides/realtime-vad" rel="nofollow">https://developers.openai.com/api/docs/guides/realtime-vad</a><p>OpenAI's Semantic mode is looking at the semantic meaning of the transcribed text to make an educated guess about where the user's end of utterance is.<p>According to Deepgram, Flux's end-of-turn detection is not just a semantic VAD (which inherently is a <i>separate model</i> from the STT model that's doing the transcribing). Deepgram describes Flux as:<p>> the same model that produces transcripts is also responsible for modeling conversational flow and turn detection.<p>[...]<p>>  With complete semantic, acoustic, and full-turn context in a fused model, Flux is able to very accurately detect turn ends and avoid the premature interruptions common with traditional approaches.<p>Source: <a href="https://deepgram.com/learn/introducing-flux-conversational-speech-recognition#native-turn-detection" rel="nofollow">https://deepgram.com/learn/introducing-flux-conversational-s...</a><p>So according to them, end-of-turn detection isn't just based on semantic content of the transcript (which makes sense given the latency), but rather the the characteristics of the actual audio waveform itself as well.<p>Which Pipecat (open source voice AI orchestration platform) actually does as well seemingly with their smart-turn native turn detection model as well: <a href="https://github.com/pipecat-ai/smart-turn" rel="nofollow">https://github.com/pipecat-ai/smart-turn</a> (minus the built-in transcription)</p>
]]></description><pubDate>Tue, 03 Mar 2026 06:34:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47228894</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47228894</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47228894</guid></item><item><title><![CDATA[New comment by cootsnuck in "Show HN: I built a sub-500ms latency voice agent from scratch"]]></title><description><![CDATA[
<p>I've been working solely on voice agents for the past couple years (and have worked at one of the frontier voice AI companies).<p>The cascading model (STT -> LLM -> TTS), is unlikely to go away anytime soon for a whole lot of reasons. A big one is observability. The people paying for voice agents are enterprises. Enterprises care about reliability and liability. The cascading model approach is much more amenable to specialization (rather than raw flexibility / generality) and auditability.<p>Organizations in regulated industries (e.g. healthcare, finance, education) need to be able to see what a voice agent "heard" before it tries to "act" on transcribed text, and same goes for seeing what LLM output text is going to be "said" before it's actually synthesized and played back.<p>Speech-to-Speech (end-to-end) models definitely have a place for more "narrative" use cases (think interviewing, conducting surveys / polls, etc.).<p>But from my experience from working with clients, they are clamoring for systems and orchestration that actually use some good ol' fashioned engineering and that don't solely rely on the latest-and-greatest SoTA ML models.</p>
]]></description><pubDate>Tue, 03 Mar 2026 06:15:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=47228784</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47228784</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47228784</guid></item><item><title><![CDATA[New comment by cootsnuck in "Show HN: I built a sub-500ms latency voice agent from scratch"]]></title><description><![CDATA[
<p>Yea, Deepgram Flux is the secret sauce. Doesn't get talked about much.<p>For anyone curious: <a href="https://flux.deepgram.com/" rel="nofollow">https://flux.deepgram.com/</a></p>
]]></description><pubDate>Tue, 03 Mar 2026 05:55:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47228667</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47228667</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47228667</guid></item><item><title><![CDATA[New comment by cootsnuck in "We do not think Anthropic should be designated as a supply chain risk"]]></title><description><![CDATA[
<p>Except an LLM actually is a piece of software. And the brain is not what you said.</p>
]]></description><pubDate>Sun, 01 Mar 2026 17:06:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47208514</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47208514</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47208514</guid></item><item><title><![CDATA[New comment by cootsnuck in "Microgpt"]]></title><description><![CDATA[
<p>It was: <a href="https://news.ycombinator.com/item?id=47000263">https://news.ycombinator.com/item?id=47000263</a></p>
]]></description><pubDate>Sun, 01 Mar 2026 07:33:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47204549</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47204549</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47204549</guid></item><item><title><![CDATA[New comment by cootsnuck in "Discord cuts ties with identity verification software, Persona"]]></title><description><![CDATA[
<p>"Good" is subjective. But yes, all wealth creation requires working with other people. No one is an island. And most people are increasingly disturbed by the types of decisions required to amass more wealth than sovereign nations.</p>
]]></description><pubDate>Tue, 24 Feb 2026 16:48:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47139319</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47139319</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47139319</guid></item><item><title><![CDATA[New comment by cootsnuck in "Discord cuts ties with identity verification software, Persona"]]></title><description><![CDATA[
<p>Yea, it's puzzling to me that this isn't asked of folks like Altman and Amodei in every interview. Maybe it's because Altman would just start shilling his eye scanning orb and start repeating "WORLD COIN" ad nauseum. Either way, they should be getting pressed on this by all media.</p>
]]></description><pubDate>Tue, 24 Feb 2026 16:44:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47139262</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47139262</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47139262</guid></item><item><title><![CDATA[New comment by cootsnuck in "Nvidia and OpenAI abandon unfinished $100B deal in favour of $30B investment"]]></title><description><![CDATA[
<p>Ed is the anger translator in my head. Good stuff.</p>
]]></description><pubDate>Fri, 20 Feb 2026 17:01:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47090595</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47090595</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47090595</guid></item><item><title><![CDATA[New comment by cootsnuck in ""Token anxiety", a slot machine by any other name"]]></title><description><![CDATA[
<p>Fellow Midwesterner?</p>
]]></description><pubDate>Tue, 17 Feb 2026 16:28:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47049306</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47049306</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47049306</guid></item><item><title><![CDATA[New comment by cootsnuck in "14-year-old Miles Wu folded origami pattern that holds 10k times its own weight"]]></title><description><![CDATA[
<p>Totally agree. Adult life is just mentally taxing. I'm more curious and more eager to learn now in my 30s than I was in any of my schooling. The learning isn't hard but the energy regulation is.<p>I think it's so easy for people to discount "mental energy" since culturally we don't often acknowledge it as a finite resource the same way we do physical energy. Well maybe the problem is we view them as separate things in the first place.<p>When I was younger I just didn't have to worry about so much stuff.</p>
]]></description><pubDate>Tue, 17 Feb 2026 15:53:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=47048861</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47048861</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47048861</guid></item><item><title><![CDATA[New comment by cootsnuck in "IBM tripling entry-level jobs after finding the limits of AI adoption"]]></title><description><![CDATA[
<p>They said they're going to invest like $150B over five years. Which is quite a bit smaller than other big tech firms.<p>They have their Granite family of models, but they're small language models so surely significantly less resources are going into them.</p>
]]></description><pubDate>Sun, 15 Feb 2026 00:33:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47019912</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=47019912</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47019912</guid></item><item><title><![CDATA[New comment by cootsnuck in "Ask HN: The Coming Class War"]]></title><description><![CDATA[
<p>There's no "end" per se, but shifting dynamics. I personally think we are currently seeing a "slow", but noteworthy-in-hindsight, shift in interest and development of alternative options and platforms from the big tech monopolies. From things like Bluesky, Framework laptops, minimalist cell phones, and even smaller local language models and other types of useful local ML models.<p>I don't think it's going to be anything like 50% or even 30% of users using non-flagship hardware or software products. But it could still be significant. And I think the more important thing isn't going to be market share as much as proof of viability. More successful examples will beget more.<p>It's about planting seeds from which future digital ecosystems can grow -- that have interoperability, functionality, and openness built into their foundations.<p>I believe that what drove you to make this post and the way I feel is not unique and are part of a larger swell in similar sentiments.<p>You throw in other factors too like the mass tech layoffs and the continued doubling down of tech barons on their cravings to intermingle with the surveillance state and military industrial complex... I just can't see how the future doesn't have <i>more</i> people disillusioned with the current state of the tech industry.<p>I think big tech will continue to overplay their hand and the mess that comes after will be an opportunity to give people what they want and show alternatives to what's already been done that we know won't work out.</p>
]]></description><pubDate>Sun, 08 Feb 2026 02:53:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=46930895</link><dc:creator>cootsnuck</dc:creator><comments>https://news.ycombinator.com/item?id=46930895</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46930895</guid></item></channel></rss>