<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ilyakaminsky</title><link>https://news.ycombinator.com/user?id=ilyakaminsky</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 06 Apr 2026 04:44:05 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ilyakaminsky" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ilyakaminsky in "Nano Banana can be prompt engineered for nuanced AI image generation"]]></title><description><![CDATA[
<p>I use Gemini CLI on a daily basis. It used to crash often and I'd lose the chat history. I found this tool called ai-cli-log [1] and it does something similar out of the box. I don't run Gemini CLI without it.<p>[1] <a href="https://github.com/alingse/ai-cli-log" rel="nofollow">https://github.com/alingse/ai-cli-log</a></p>
]]></description><pubDate>Fri, 14 Nov 2025 02:24:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=45923170</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=45923170</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45923170</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "Launch HN: Strata (YC X25) – One MCP server for AI to handle thousands of tools"]]></title><description><![CDATA[
<p>How can I submit my service to your website? Is there a simpler way than creating a PR here? <a href="https://github.com/Klavis-AI/klavis/tree/main/mcp_servers" rel="nofollow">https://github.com/Klavis-AI/klavis/tree/main/mcp_servers</a></p>
]]></description><pubDate>Wed, 24 Sep 2025 01:20:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=45355087</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=45355087</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45355087</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "Show HN: Whispering – Open-source, local-first dictation you can trust"]]></title><description><![CDATA[
<p>Shameless plug -- check out speechischeap.com<p>I spent three months perfecting the speaker diarization pipeline and I think you'll be quite pleased with the results.</p>
]]></description><pubDate>Tue, 19 Aug 2025 12:22:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=44950770</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44950770</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44950770</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "Fast"]]></title><description><![CDATA[
<p>> i can run it on consumer hardware for vastly cheaper than the cloud<p>Woah, that's really cool, CJ! I've been toying the with idea of standing up a cluster of older iPhones to run Apple's Speech framework. [1] The inspiration came from this blog post [2] where the author is using it for OCR. A couple of things are holding me back: (1) the OSS models are better according to the current benchmarks and (2) I have customers all over the world, so that geographical load-balancing is a real factor. With that said, I'll definitely spend some time checking out your work. Thanks for sharing!<p>[1] <a href="https://developer.apple.com/documentation/speech" rel="nofollow">https://developer.apple.com/documentation/speech</a><p>[2] <a href="https://terminalbytes.com/iphone-8-solar-powered-vision-ocr-server/" rel="nofollow">https://terminalbytes.com/iphone-8-solar-powered-vision-ocr-...</a></p>
]]></description><pubDate>Wed, 30 Jul 2025 23:12:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=44740636</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44740636</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44740636</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "Fast"]]></title><description><![CDATA[
<p>Hmm… That's a good point. I recall a few instances where I went too far to the detriment of production. Having a trusty testing and benchmarking suite thankfully helped with keeping things more stable. As a solo developer, I really enjoy the development process, so while that bit is costly, I didn't really consider that until you mentioned it.</p>
]]></description><pubDate>Wed, 30 Jul 2025 21:21:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=44739652</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44739652</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44739652</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "Problem solving using Markov chains (2007) [pdf]"]]></title><description><![CDATA[
<p>TIL, thanks! I asked Claude to generate a simulator [1] based on your comment. I think it came out well.<p>[1] <a href="https://claude.ai/public/artifacts/1b921a50-897e-4d9e-8cfa-0b1c32a35704" rel="nofollow">https://claude.ai/public/artifacts/1b921a50-897e-4d9e-8cfa-0...</a></p>
]]></description><pubDate>Wed, 30 Jul 2025 20:45:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=44739308</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44739308</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44739308</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "Fast"]]></title><description><![CDATA[
<p>Fast is also cheap. Especially in the world of cloud computing where you pay by the second. The only way I could create a profitable transcription service [1] that undercuts the rest was by optimizing every little thing along the way. For instance, just yesterday I learned that the image size I've put together is 2.5× smaller than the next open source variant. That means faster cold boots, which reduces the cost (and providers a better service).<p>[1] <a href="https://speechischeap.com" rel="nofollow">https://speechischeap.com</a></p>
]]></description><pubDate>Wed, 30 Jul 2025 19:54:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=44738809</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44738809</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44738809</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabic"]]></title><description><![CDATA[
<p>Thanks for noticing. It took a lot of effort to optimize the pipeline every step of the way. VAD, inference server, hardware optimization, etc. But nothing that would compromise on quality. The audio is currently transcribed in its original speed. I'll be sure to publish something if I manage to speed it up without incurring any losses to the WER.</p>
]]></description><pubDate>Tue, 22 Jul 2025 18:08:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=44650923</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44650923</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44650923</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabic"]]></title><description><![CDATA[
<p>I wouldn't describe it as "unusable" so much as needing to understand its constraints and how to work around them. I built a business on top of Whisper [1] and one of the early key insights was to implement a good voice activity detection (VAD) model in order to reduce Whisper's hallucinations on silence.<p>[1] <a href="https://speechischeap.com" rel="nofollow">https://speechischeap.com</a></p>
]]></description><pubDate>Tue, 22 Jul 2025 11:18:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=44645506</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44645506</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44645506</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "OpenAI charges by the minute, so speed up your audio"]]></title><description><![CDATA[
<p>Not yet. The gains in efficiency come from optimizing the speedup factor. Real-time audio cannot be processed any faster than 1× by definition.</p>
]]></description><pubDate>Tue, 08 Jul 2025 16:16:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=44501440</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44501440</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44501440</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "OpenAI charges by the minute, so speed up your audio"]]></title><description><![CDATA[
<p>It's sustainable, but not enough to retire on at this point.<p>> Just wondering if I cam build a retirement out of APIs :)<p>I think it's possible, but you need to find a way to add value beyond the commodity itself (e.g., audio classification and speaker diarization in my case).</p>
]]></description><pubDate>Thu, 26 Jun 2025 07:04:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=44384932</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44384932</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44384932</guid></item><item><title><![CDATA[New comment by ilyakaminsky in "OpenAI charges by the minute, so speed up your audio"]]></title><description><![CDATA[
<p>I've already done that [1]. A fraction of the price, 24-hour limit per file, and speedup tricks like the OP's are welcome. :)<p>[1] <a href="https://speechischeap.com" rel="nofollow">https://speechischeap.com</a></p>
]]></description><pubDate>Wed, 25 Jun 2025 18:39:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=44380550</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=44380550</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44380550</guid></item><item><title><![CDATA[Show HN: I built Speech is Cheap for fast, long-form audio transcription]]></title><description><![CDATA[
<p>Hi HN, I created a transcription service called Speech is Cheap. I put in a lot effort to make it fast without losing too much accuracy. For instance, it takes 10 minutes to transcribe ~21 hours of Little Women [1]. I also just released the add-ons to classify audio, diarize speakers, and filter out segments based on the models' confidence scores. So it takes about a minute to process the noisy, hour-long Starship 8 launch with all those features enabled [2]. Transcribing these respective files costs $1.16 and $0.11 for subscribers or $2.50 and $0.23 for pay-as-you-go users.<p>I released the MVP about six months ago. But some users wanted to parse speakers, so I spent the last couple of months reworking the entire pipeline. Initially, the biggest challenge was creating the custom voice activity detection (VAD) functionality. Once that was done, I got more confident and incorporated a powerful diarization model as well. The rest of the time was spent on fine-tuning and optimizing everything end-to-end. I learned a ton and will blog about it soon.<p>Sharing Speech is Cheap with HN is a big step for me. My main focus has been on the engineering side, so I'm a bit puzzled by how to properly market it. If you have experiences on what genuinely works to spread the word, I'd be very grateful to hear your thoughts and perspectives.<p>You can try it out completely for free by picking the Pay-as-you-go option and applying this `HN5` $5 off promo code, which is good for 2500 minutes of regular transcriptions. I'll stick around to answer any questions.<p>[1] <a href="https://youtu.be/yduuUGUj5Bg" rel="nofollow">https://youtu.be/yduuUGUj5Bg</a> » <a href="https://cdn.speechischeap.com/out/little_women.json" rel="nofollow">https://cdn.speechischeap.com/out/little_women.json</a><p>[2] <a href="https://x.com/SpaceX/status/1897438948458189156" rel="nofollow">https://x.com/SpaceX/status/1897438948458189156</a> » <a href="https://cdn.speechischeap.com/out/starship8.json" rel="nofollow">https://cdn.speechischeap.com/out/starship8.json</a></p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43907594">https://news.ycombinator.com/item?id=43907594</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 06 May 2025 17:29:59 +0000</pubDate><link>https://speechischeap.com</link><dc:creator>ilyakaminsky</dc:creator><comments>https://news.ycombinator.com/item?id=43907594</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43907594</guid></item></channel></rss>