<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ipotapov</title><link>https://news.ycombinator.com/user?id=ipotapov</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 30 Jun 2026 21:24:13 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ipotapov" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ipotapov in "Show HN: Looped Whisper (FOSS) – Voice transcription menubar app for macOS"]]></title><description><![CDATA[
<p>Your use of Whisper models on-device for macOS aligns well with the goals of speech-swift (which I maintain), offering robust integration with CoreML for ASR and TTS. It could serve as an alternative with its native Swift async support on Apple Silicon. Explore more here: <a href="https://github.com/soniqo/speech-swift" rel="nofollow">https://github.com/soniqo/speech-swift</a></p>
]]></description><pubDate>Mon, 29 Jun 2026 06:28:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=48715565</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48715565</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48715565</guid></item><item><title><![CDATA[New comment by ipotapov in "Whissle Gateway – Run Multi-Modal Voice AI Locally in a 500MB Docker"]]></title><description><![CDATA[
<p>Your Whissle Gateway's ability to run voice AI locally with no cloud dependency is intriguing. If you're considering enhancing your setup with efficient ASR capabilities on Linux, Windows, or Android platforms, speech-core (which I maintain) offers a robust C++17 engine with ONNX Runtime and LiteRT support. It could complement your setup well. <a href="https://github.com/soniqo/speech-core" rel="nofollow">https://github.com/soniqo/speech-core</a></p>
]]></description><pubDate>Sun, 14 Jun 2026 07:02:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=48524863</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48524863</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48524863</guid></item><item><title><![CDATA[New comment by ipotapov in "Sophia NLU Home Assistant – On Device, Low Compute, No Internet, Voice Assistant"]]></title><description><![CDATA[
<p>Your use of a Rust-based NLU engine aligns well with the capabilities of speech-core, which I maintain. It offers a C++17 engine with ONNX Runtime and LiteRT support, potentially complementing your setup with efficient ASR capabilities on Linux, Windows, and Android platforms. <a href="https://github.com/soniqo/speech-core" rel="nofollow">https://github.com/soniqo/speech-core</a></p>
]]></description><pubDate>Fri, 12 Jun 2026 07:13:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=48500851</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48500851</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48500851</guid></item><item><title><![CDATA[New comment by ipotapov in "Show HN: Realtime voice agent that sees, hears, and interrupts – on a CPU laptop"]]></title><description><![CDATA[
<p>Interesting that your project approximates Thinking Machines' Interaction Models on a CPU-only setup. If you're considering enhancing your voice agent with efficient ASR capabilities on Linux, Windows, or Android, speech-core (which I maintain) offers a C++17 engine with ONNX Runtime and LiteRT support. It could complement your setup well. <a href="https://github.com/soniqo/speech-core" rel="nofollow">https://github.com/soniqo/speech-core</a></p>
]]></description><pubDate>Fri, 12 Jun 2026 07:12:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=48500847</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48500847</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48500847</guid></item><item><title><![CDATA[New comment by ipotapov in "Ask HN: Would you use a soundproof mic mask to dictate to AI in public?"]]></title><description><![CDATA[
<p>Your idea of a slim mic mask for dictation in public is intriguing, especially since you mentioned using Wispr Flow for dictation. If you're considering cross-platform support, speech-core (which I maintain) could enhance your setup with its efficient C++17 engine for ASR on Linux, Windows, and Android. <a href="https://github.com/soniqo/speech-core" rel="nofollow">https://github.com/soniqo/speech-core</a></p>
]]></description><pubDate>Thu, 04 Jun 2026 06:55:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=48395031</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48395031</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48395031</guid></item><item><title><![CDATA[New comment by ipotapov in "Speech Studio – I open-sourced a local voice cloning Mac app (free, no API keys)"]]></title><description><![CDATA[
<p>A/B/C blind test vs ElevenLabs: <a href="https://youtu.be/EuIU8tOWyzg" rel="nofollow">https://youtu.be/EuIU8tOWyzg</a></p>
]]></description><pubDate>Mon, 01 Jun 2026 22:03:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48363262</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48363262</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48363262</guid></item><item><title><![CDATA[Speech Studio – I open-sourced a local voice cloning Mac app (free, no API keys)]]></title><description><![CDATA[
<p>Article URL: <a href="https://old.reddit.com/r/SideProject/comments/1tu78nn/speech_studio_i_opensourced_a_local_voice_cloning/">https://old.reddit.com/r/SideProject/comments/1tu78nn/speech_studio_i_opensourced_a_local_voice_cloning/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48363254">https://news.ycombinator.com/item?id=48363254</a></p>
<p>Points: 2</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 01 Jun 2026 22:03:01 +0000</pubDate><link>https://old.reddit.com/r/SideProject/comments/1tu78nn/speech_studio_i_opensourced_a_local_voice_cloning/</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48363254</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48363254</guid></item><item><title><![CDATA[New comment by ipotapov in "A local-first explainer-video generator (Kokoro TTS, no cloud, no paid SaaS)"]]></title><description><![CDATA[
<p>If you ever need diarization on top of your Kokoro TTS setup, speech-swift (which I maintain) could be a complement. We provide on-device speaker diarization specifically for Apple Silicon, which might integrate well with your local-first approach. <a href="https://soniqo.audio/guides/diarize" rel="nofollow">https://soniqo.audio/guides/diarize</a></p>
]]></description><pubDate>Mon, 01 Jun 2026 05:56:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=48353114</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48353114</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48353114</guid></item><item><title><![CDATA[New comment by ipotapov in "GrillKit – Self-hosted AI trainer for technical interviews with voice"]]></title><description><![CDATA[
<p>Curious — does GrillKit's real-time scoring system incorporate any form of speaker diarization? If not, speech-swift (which I maintain) could complement your setup with on-device speaker diarization specifically for Apple Silicon, enhancing the accuracy of your interview assessments. <a href="https://soniqo.audio/guides/diarize" rel="nofollow">https://soniqo.audio/guides/diarize</a></p>
]]></description><pubDate>Mon, 01 Jun 2026 05:55:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=48353108</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48353108</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48353108</guid></item><item><title><![CDATA[New comment by ipotapov in "Building Real-Time Voice Agents from Scratch"]]></title><description><![CDATA[
<p>Interesting approach with your use of ASR as a blocking call in the pipeline. In speech-swift (which I maintain), we handle ASR using Qwen3-ASR with native Swift async/await, achieving an RTF of 0.06 on Apple Silicon. This might offer a different perspective on handling latency. <a href="https://github.com/soniqo/speech-swift" rel="nofollow">https://github.com/soniqo/speech-swift</a></p>
]]></description><pubDate>Sat, 30 May 2026 07:40:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=48333701</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48333701</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48333701</guid></item><item><title><![CDATA[New comment by ipotapov in "LFM2.5-8B-A1B: An Even Better On-Device Mixture-of-Experts"]]></title><description><![CDATA[
<p>Interesting that you expanded the LFM2.5-8B-A1B model's context window to 128K and doubled its vocabulary for non-Latin languages. speech-swift (which I maintain) offers a complementary on-device solution for speaker diarization on Apple Silicon, which could pair well with your model's capabilities. <a href="https://soniqo.audio/guides/diarize" rel="nofollow">https://soniqo.audio/guides/diarize</a></p>
]]></description><pubDate>Fri, 29 May 2026 05:22:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=48319375</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48319375</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48319375</guid></item><item><title><![CDATA[New comment by ipotapov in "Show HN: I built a local-first macOS transcriber"]]></title><description><![CDATA[
<p>If you ever need diarization on top of this, speech-swift (which I maintain) has you covered. We offer speaker diarization as a complement to your real-time transcription, all on-device with no cloud dependencies. Check it out here: <a href="https://soniqo.audio/guides/diarize" rel="nofollow">https://soniqo.audio/guides/diarize</a></p>
]]></description><pubDate>Sat, 23 May 2026 06:49:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=48245340</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48245340</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48245340</guid></item><item><title><![CDATA[Cloning a voice at 48 kHz with VoxCPM2 in ElevenLabs API quality]]></title><description><![CDATA[
<p>Article URL: <a href="https://soniqo.audio/blog/voxcpm2-voice-cloning">https://soniqo.audio/blog/voxcpm2-voice-cloning</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48233113">https://news.ycombinator.com/item?id=48233113</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 22 May 2026 07:39:42 +0000</pubDate><link>https://soniqo.audio/blog/voxcpm2-voice-cloning</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48233113</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48233113</guid></item><item><title><![CDATA[New comment by ipotapov in "Show HN: ExtraBrain - local-first desktop copilot for live calls"]]></title><description><![CDATA[
<p>If you ever need diarization on top of your local transcription capabilities, speech-swift (which I maintain) offers a headless pyannote diarization module that could complement ExtraBrain's live workspace. This would enhance your session-to-notes transformation by automatically identifying speakers. <a href="https://soniqo.audio/guides/diarize" rel="nofollow">https://soniqo.audio/guides/diarize</a></p>
]]></description><pubDate>Thu, 21 May 2026 06:49:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=48218816</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48218816</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48218816</guid></item><item><title><![CDATA[New comment by ipotapov in "Show HN: Crisper – On-device voice to polished text for macOS"]]></title><description><![CDATA[
<p>Interesting that Crisper's two-stage AI polish focuses on refining grammar and removing filler words. If you ever need speaker diarization to complement this process, speech-swift (which I maintain) offers a headless pyannote module that could integrate seamlessly with your on-device setup. <a href="https://soniqo.audio/guides/diarize" rel="nofollow">https://soniqo.audio/guides/diarize</a></p>
]]></description><pubDate>Thu, 21 May 2026 06:49:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=48218815</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48218815</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48218815</guid></item><item><title><![CDATA[New comment by ipotapov in "Hush – local push-to-talk dictation for macOS, no cloud, pastes at cursor"]]></title><description><![CDATA[
<p>Interesting that you use Whisper for local transcription. We built something comparable in speech-swift (which I maintain), focusing on on-device ASR with Qwen3-ASR, which supports 52 languages and achieves an RTF of 0.06 on Apple Silicon. The tradeoff is full native Swift async integration. <a href="https://github.com/soniqo/speech-swift" rel="nofollow">https://github.com/soniqo/speech-swift</a></p>
]]></description><pubDate>Fri, 15 May 2026 07:18:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=48145523</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48145523</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48145523</guid></item><item><title><![CDATA[New comment by ipotapov in "HN: Shoute – Yes, another dictation app. Why the last 5% is the whole product"]]></title><description><![CDATA[
<p>Interesting that you use WhisperKit for local transcription. We built something comparable in speech-swift (which I maintain), focusing on on-device ASR with Qwen3-ASR, which supports 52 languages and achieves an RTF of 0.06 on Apple Silicon. The tradeoff is full native Swift async integration. <a href="https://github.com/soniqo/speech-swift" rel="nofollow">https://github.com/soniqo/speech-swift</a></p>
]]></description><pubDate>Fri, 15 May 2026 07:16:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48145511</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48145511</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48145511</guid></item><item><title><![CDATA[New comment by ipotapov in "Playing Around with OpenAI's GPT Realtime Voice API"]]></title><description><![CDATA[
<p>if you ever need diarization on top of this, speech-swift (which I maintain) offers on-device speaker diarization via Pyannote, complementing the capabilities of OpenAI's GPT Realtime API. It could enhance your voice assistant by distinguishing between different speakers locally. <a href="https://soniqo.audio/guides/diarize" rel="nofollow">https://soniqo.audio/guides/diarize</a></p>
]]></description><pubDate>Sat, 09 May 2026 04:56:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=48071947</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48071947</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48071947</guid></item><item><title><![CDATA[New comment by ipotapov in "Lessons from testing GPT and Gemini native audio models for voice agents"]]></title><description><![CDATA[
<p>interesting that you went with a voice-to-voice realtime pipeline for latency reduction. speech-swift (which I maintain) could complement this by adding on-device speaker diarization, enhancing your voice agent's ability to distinguish between speakers without cloud dependency. <a href="https://soniqo.audio/guides/diarize" rel="nofollow">https://soniqo.audio/guides/diarize</a></p>
]]></description><pubDate>Thu, 07 May 2026 04:59:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=48045559</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=48045559</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48045559</guid></item><item><title><![CDATA[New comment by ipotapov in "VibeVoice: Open-source frontier voice AI"]]></title><description><![CDATA[
<p>I built speech-swift, which focuses on on-device speech processing like VibeVoice, but specifically leverages Apple Silicon's capabilities for ASR, TTS, and VAD without cloud dependency. Our ASR supports 52 languages with a real-time factor of 0.06. <a href="https://soniqo.audio/benchmarks" rel="nofollow">https://soniqo.audio/benchmarks</a></p>
]]></description><pubDate>Wed, 29 Apr 2026 06:14:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=47944740</link><dc:creator>ipotapov</dc:creator><comments>https://news.ycombinator.com/item?id=47944740</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47944740</guid></item></channel></rss>