Hacker News: ilyakaminsky

New comment by ilyakaminsky in "Nano Banana can be prompt engineered for nuanced AI image generation"

ilyakaminsky — Fri, 14 Nov 2025 02:24:17 +0000

I use Gemini CLI on a daily basis. It used to crash often and I'd lose the chat history. I found this tool called ai-cli-log [1] and it does something similar out of the box. I don't run Gemini CLI without it.

[1] https://github.com/alingse/ai-cli-log

New comment by ilyakaminsky in "Launch HN: Strata (YC X25) – One MCP server for AI to handle thousands of tools"

ilyakaminsky — Wed, 24 Sep 2025 01:20:19 +0000

How can I submit my service to your website? Is there a simpler way than creating a PR here? https://github.com/Klavis-AI/klavis/tree/main/mcp_servers

New comment by ilyakaminsky in "Show HN: Whispering – Open-source, local-first dictation you can trust"

ilyakaminsky — Tue, 19 Aug 2025 12:22:13 +0000

Shameless plug -- check out speechischeap.com

I spent three months perfecting the speaker diarization pipeline and I think you'll be quite pleased with the results.

New comment by ilyakaminsky in "Fast"

ilyakaminsky — Wed, 30 Jul 2025 23:12:23 +0000

> i can run it on consumer hardware for vastly cheaper than the cloud

Woah, that's really cool, CJ! I've been toying the with idea of standing up a cluster of older iPhones to run Apple's Speech framework. [1] The inspiration came from this blog post [2] where the author is using it for OCR. A couple of things are holding me back: (1) the OSS models are better according to the current benchmarks and (2) I have customers all over the world, so that geographical load-balancing is a real factor. With that said, I'll definitely spend some time checking out your work. Thanks for sharing!

[1] https://developer.apple.com/documentation/speech

[2] https://terminalbytes.com/iphone-8-solar-powered-vision-ocr-...

New comment by ilyakaminsky in "Fast"

ilyakaminsky — Wed, 30 Jul 2025 21:21:47 +0000

Hmm… That's a good point. I recall a few instances where I went too far to the detriment of production. Having a trusty testing and benchmarking suite thankfully helped with keeping things more stable. As a solo developer, I really enjoy the development process, so while that bit is costly, I didn't really consider that until you mentioned it.

New comment by ilyakaminsky in "Problem solving using Markov chains (2007) [pdf]"

ilyakaminsky — Wed, 30 Jul 2025 20:45:58 +0000

TIL, thanks! I asked Claude to generate a simulator [1] based on your comment. I think it came out well.

[1] https://claude.ai/public/artifacts/1b921a50-897e-4d9e-8cfa-0...

New comment by ilyakaminsky in "Fast"

ilyakaminsky — Wed, 30 Jul 2025 19:54:30 +0000

Fast is also cheap. Especially in the world of cloud computing where you pay by the second. The only way I could create a profitable transcription service [1] that undercuts the rest was by optimizing every little thing along the way. For instance, just yesterday I learned that the image size I've put together is 2.5× smaller than the next open source variant. That means faster cold boots, which reduces the cost (and providers a better service).

[1] https://speechischeap.com

New comment by ilyakaminsky in "Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabic"

ilyakaminsky — Tue, 22 Jul 2025 18:08:28 +0000

Thanks for noticing. It took a lot of effort to optimize the pipeline every step of the way. VAD, inference server, hardware optimization, etc. But nothing that would compromise on quality. The audio is currently transcribed in its original speed. I'll be sure to publish something if I manage to speed it up without incurring any losses to the WER.

New comment by ilyakaminsky in "Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabic"

ilyakaminsky — Tue, 22 Jul 2025 11:18:10 +0000

I wouldn't describe it as "unusable" so much as needing to understand its constraints and how to work around them. I built a business on top of Whisper [1] and one of the early key insights was to implement a good voice activity detection (VAD) model in order to reduce Whisper's hallucinations on silence.

[1] https://speechischeap.com

New comment by ilyakaminsky in "OpenAI charges by the minute, so speed up your audio"

ilyakaminsky — Tue, 08 Jul 2025 16:16:08 +0000

Not yet. The gains in efficiency come from optimizing the speedup factor. Real-time audio cannot be processed any faster than 1× by definition.

New comment by ilyakaminsky in "OpenAI charges by the minute, so speed up your audio"

ilyakaminsky — Thu, 26 Jun 2025 07:04:42 +0000

It's sustainable, but not enough to retire on at this point.

> Just wondering if I cam build a retirement out of APIs :)

I think it's possible, but you need to find a way to add value beyond the commodity itself (e.g., audio classification and speaker diarization in my case).

New comment by ilyakaminsky in "OpenAI charges by the minute, so speed up your audio"

ilyakaminsky — Wed, 25 Jun 2025 18:39:37 +0000

I've already done that [1]. A fraction of the price, 24-hour limit per file, and speedup tricks like the OP's are welcome. :)

[1] https://speechischeap.com

Show HN: I built Speech is Cheap for fast, long-form audio transcription

ilyakaminsky — Tue, 06 May 2025 17:29:59 +0000

Hi HN, I created a transcription service called Speech is Cheap. I put in a lot effort to make it fast without losing too much accuracy. For instance, it takes 10 minutes to transcribe ~21 hours of Little Women [1]. I also just released the add-ons to classify audio, diarize speakers, and filter out segments based on the models' confidence scores. So it takes about a minute to process the noisy, hour-long Starship 8 launch with all those features enabled [2]. Transcribing these respective files costs $1.16 and $0.11 for subscribers or $2.50 and $0.23 for pay-as-you-go users.

I released the MVP about six months ago. But some users wanted to parse speakers, so I spent the last couple of months reworking the entire pipeline. Initially, the biggest challenge was creating the custom voice activity detection (VAD) functionality. Once that was done, I got more confident and incorporated a powerful diarization model as well. The rest of the time was spent on fine-tuning and optimizing everything end-to-end. I learned a ton and will blog about it soon.

Sharing Speech is Cheap with HN is a big step for me. My main focus has been on the engineering side, so I'm a bit puzzled by how to properly market it. If you have experiences on what genuinely works to spread the word, I'd be very grateful to hear your thoughts and perspectives.

You can try it out completely for free by picking the Pay-as-you-go option and applying this `HN5` $5 off promo code, which is good for 2500 minutes of regular transcriptions. I'll stick around to answer any questions.

[1] https://youtu.be/yduuUGUj5Bg » https://cdn.speechischeap.com/out/little_women.json

[2] https://x.com/SpaceX/status/1897438948458189156 » https://cdn.speechischeap.com/out/starship8.json

Comments URL: https://news.ycombinator.com/item?id=43907594

Points: 1

# Comments: 0