Hacker News: ipotapov

New comment by ipotapov in "Show HN: Looped Whisper (FOSS) – Voice transcription menubar app for macOS"

ipotapov — Mon, 29 Jun 2026 06:28:03 +0000

Your use of Whisper models on-device for macOS aligns well with the goals of speech-swift (which I maintain), offering robust integration with CoreML for ASR and TTS. It could serve as an alternative with its native Swift async support on Apple Silicon. Explore more here: https://github.com/soniqo/speech-swift

New comment by ipotapov in "Whissle Gateway – Run Multi-Modal Voice AI Locally in a 500MB Docker"

ipotapov — Sun, 14 Jun 2026 07:02:20 +0000

Your Whissle Gateway's ability to run voice AI locally with no cloud dependency is intriguing. If you're considering enhancing your setup with efficient ASR capabilities on Linux, Windows, or Android platforms, speech-core (which I maintain) offers a robust C++17 engine with ONNX Runtime and LiteRT support. It could complement your setup well. https://github.com/soniqo/speech-core

New comment by ipotapov in "Sophia NLU Home Assistant – On Device, Low Compute, No Internet, Voice Assistant"

ipotapov — Fri, 12 Jun 2026 07:13:13 +0000

Your use of a Rust-based NLU engine aligns well with the capabilities of speech-core, which I maintain. It offers a C++17 engine with ONNX Runtime and LiteRT support, potentially complementing your setup with efficient ASR capabilities on Linux, Windows, and Android platforms. https://github.com/soniqo/speech-core

New comment by ipotapov in "Show HN: Realtime voice agent that sees, hears, and interrupts – on a CPU laptop"

ipotapov — Fri, 12 Jun 2026 07:12:47 +0000

Interesting that your project approximates Thinking Machines' Interaction Models on a CPU-only setup. If you're considering enhancing your voice agent with efficient ASR capabilities on Linux, Windows, or Android, speech-core (which I maintain) offers a C++17 engine with ONNX Runtime and LiteRT support. It could complement your setup well. https://github.com/soniqo/speech-core

New comment by ipotapov in "Ask HN: Would you use a soundproof mic mask to dictate to AI in public?"

ipotapov — Thu, 04 Jun 2026 06:55:11 +0000

Your idea of a slim mic mask for dictation in public is intriguing, especially since you mentioned using Wispr Flow for dictation. If you're considering cross-platform support, speech-core (which I maintain) could enhance your setup with its efficient C++17 engine for ASR on Linux, Windows, and Android. https://github.com/soniqo/speech-core

New comment by ipotapov in "Speech Studio – I open-sourced a local voice cloning Mac app (free, no API keys)"

ipotapov — Mon, 01 Jun 2026 22:03:45 +0000

A/B/C blind test vs ElevenLabs: https://youtu.be/EuIU8tOWyzg

Speech Studio – I open-sourced a local voice cloning Mac app (free, no API keys)

ipotapov — Mon, 01 Jun 2026 22:03:01 +0000

Article URL: https://old.reddit.com/r/SideProject/comments/1tu78nn/speech_studio_i_opensourced_a_local_voice_cloning/

Comments URL: https://news.ycombinator.com/item?id=48363254

Points: 2

# Comments: 1

New comment by ipotapov in "A local-first explainer-video generator (Kokoro TTS, no cloud, no paid SaaS)"

ipotapov — Mon, 01 Jun 2026 05:56:22 +0000

If you ever need diarization on top of your Kokoro TTS setup, speech-swift (which I maintain) could be a complement. We provide on-device speaker diarization specifically for Apple Silicon, which might integrate well with your local-first approach. https://soniqo.audio/guides/diarize

New comment by ipotapov in "GrillKit – Self-hosted AI trainer for technical interviews with voice"

ipotapov — Mon, 01 Jun 2026 05:55:34 +0000

Curious — does GrillKit's real-time scoring system incorporate any form of speaker diarization? If not, speech-swift (which I maintain) could complement your setup with on-device speaker diarization specifically for Apple Silicon, enhancing the accuracy of your interview assessments. https://soniqo.audio/guides/diarize

New comment by ipotapov in "Building Real-Time Voice Agents from Scratch"

ipotapov — Sat, 30 May 2026 07:40:19 +0000

Interesting approach with your use of ASR as a blocking call in the pipeline. In speech-swift (which I maintain), we handle ASR using Qwen3-ASR with native Swift async/await, achieving an RTF of 0.06 on Apple Silicon. This might offer a different perspective on handling latency. https://github.com/soniqo/speech-swift

New comment by ipotapov in "LFM2.5-8B-A1B: An Even Better On-Device Mixture-of-Experts"

ipotapov — Fri, 29 May 2026 05:22:00 +0000

Interesting that you expanded the LFM2.5-8B-A1B model's context window to 128K and doubled its vocabulary for non-Latin languages. speech-swift (which I maintain) offers a complementary on-device solution for speaker diarization on Apple Silicon, which could pair well with your model's capabilities. https://soniqo.audio/guides/diarize

New comment by ipotapov in "Show HN: I built a local-first macOS transcriber"

ipotapov — Sat, 23 May 2026 06:49:36 +0000

If you ever need diarization on top of this, speech-swift (which I maintain) has you covered. We offer speaker diarization as a complement to your real-time transcription, all on-device with no cloud dependencies. Check it out here: https://soniqo.audio/guides/diarize

Cloning a voice at 48 kHz with VoxCPM2 in ElevenLabs API quality

ipotapov — Fri, 22 May 2026 07:39:42 +0000

Article URL: https://soniqo.audio/blog/voxcpm2-voice-cloning

Comments URL: https://news.ycombinator.com/item?id=48233113

Points: 1

# Comments: 0

New comment by ipotapov in "Show HN: ExtraBrain - local-first desktop copilot for live calls"

ipotapov — Thu, 21 May 2026 06:49:43 +0000

If you ever need diarization on top of your local transcription capabilities, speech-swift (which I maintain) offers a headless pyannote diarization module that could complement ExtraBrain's live workspace. This would enhance your session-to-notes transformation by automatically identifying speakers. https://soniqo.audio/guides/diarize

New comment by ipotapov in "Show HN: Crisper – On-device voice to polished text for macOS"

ipotapov — Thu, 21 May 2026 06:49:42 +0000

Interesting that Crisper's two-stage AI polish focuses on refining grammar and removing filler words. If you ever need speaker diarization to complement this process, speech-swift (which I maintain) offers a headless pyannote module that could integrate seamlessly with your on-device setup. https://soniqo.audio/guides/diarize

New comment by ipotapov in "Hush – local push-to-talk dictation for macOS, no cloud, pastes at cursor"

ipotapov — Fri, 15 May 2026 07:18:17 +0000

Interesting that you use Whisper for local transcription. We built something comparable in speech-swift (which I maintain), focusing on on-device ASR with Qwen3-ASR, which supports 52 languages and achieves an RTF of 0.06 on Apple Silicon. The tradeoff is full native Swift async integration. https://github.com/soniqo/speech-swift

New comment by ipotapov in "HN: Shoute – Yes, another dictation app. Why the last 5% is the whole product"

ipotapov — Fri, 15 May 2026 07:16:45 +0000

Interesting that you use WhisperKit for local transcription. We built something comparable in speech-swift (which I maintain), focusing on on-device ASR with Qwen3-ASR, which supports 52 languages and achieves an RTF of 0.06 on Apple Silicon. The tradeoff is full native Swift async integration. https://github.com/soniqo/speech-swift

New comment by ipotapov in "Playing Around with OpenAI's GPT Realtime Voice API"

ipotapov — Sat, 09 May 2026 04:56:01 +0000

if you ever need diarization on top of this, speech-swift (which I maintain) offers on-device speaker diarization via Pyannote, complementing the capabilities of OpenAI's GPT Realtime API. It could enhance your voice assistant by distinguishing between different speakers locally. https://soniqo.audio/guides/diarize

New comment by ipotapov in "Lessons from testing GPT and Gemini native audio models for voice agents"

ipotapov — Thu, 07 May 2026 04:59:53 +0000

interesting that you went with a voice-to-voice realtime pipeline for latency reduction. speech-swift (which I maintain) could complement this by adding on-device speaker diarization, enhancing your voice agent's ability to distinguish between speakers without cloud dependency. https://soniqo.audio/guides/diarize

New comment by ipotapov in "VibeVoice: Open-source frontier voice AI"

ipotapov — Wed, 29 Apr 2026 06:14:18 +0000

I built speech-swift, which focuses on on-device speech processing like VibeVoice, but specifically leverages Apple Silicon's capabilities for ASR, TTS, and VAD without cloud dependency. Our ASR supports 52 languages with a real-time factor of 0.06. https://soniqo.audio/benchmarks