<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: sanchitmonga22</title><link>https://news.ycombinator.com/user?id=sanchitmonga22</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 13 Jun 2026 19:54:47 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=sanchitmonga22" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>please check our main repo: <a href="https://github.com/RunanywhereAI/runanywhere-sdks/" rel="nofollow">https://github.com/RunanywhereAI/runanywhere-sdks/</a><p>We are running anywhere, hence RunAnywhere, MetalRT is the fastest inference engine we made for Apple silicon, and we'll be covering other edge devices as well, All edge about to hit Warp speed!</p>
]]></description><pubDate>Wed, 11 Mar 2026 15:06:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47336593</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47336593</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47336593</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Yes, that's the plan. MetalRT will ship as part of the RunAnywhere
SDK so other developers can integrate it into their own apps. We're
working on making that available. If you want to be in the early
access group, drop me a line at founder@runanywhere.ai or open an
issue on the RCLI repo. Happy to look at your project.</p>
]]></description><pubDate>Wed, 11 Mar 2026 15:04:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47336564</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47336564</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47336564</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>That's a fair read. Tool calling reliability with sub-4B models is
genuinely the hardest unsolved problem in on-device AI right now.<p>The inference engine (MetalRT) is production-grade, the pipeline architecture
is solid, but the models at this size are still the weak link for
complex tool routing. Larger model support (where tool calling is
much more reliable) is next on the roadmap. Please stay tuned!</p>
]]></description><pubDate>Wed, 11 Mar 2026 15:03:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=47336556</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47336556</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47336556</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>That tracks with what we've seen too. For agent workflows with
reliable tool calling, you really do need the larger models.
Larger model support is a priority for us. Thanks for the data point.</p>
]]></description><pubDate>Wed, 11 Mar 2026 15:02:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47336546</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47336546</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47336546</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Fair criticism. Our benchmarks are on small models because MetalRT
was built for the voice pipeline use case, where decode latency
on 0.6B-4B models is the bottleneck.<p>You're right that the bigger opportunity on Apple Silicon is large
models that don't fit on consumer GPUs. Expanding MetalRT to 7B,
14B, 32B+ is on the roadmap. The architectural advantages(that MetalRT has) should matter
even more at that scale where everything becomes memory-bandwidth-bound.<p>We'll publish benchmarks on larger models as we add support. If you
have a specific model/size you'd want to see first, that helps us
prioritize.</p>
]]></description><pubDate>Wed, 11 Mar 2026 15:02:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47336537</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47336537</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47336537</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Good correction, thanks. You're right that NAX and ANE are distinct,
I shouldn't have conflated them. NAX's ability to accelerate LLM
prefill is exactly the kind of capability that could complement
MetalRT's decode-focused pipeline. Appreciate the clarification on
the Metal 4 / Tahoe requirement too.</p>
]]></description><pubDate>Wed, 11 Mar 2026 15:00:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47336516</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47336516</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47336516</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Yes, mobile is our primary offering and it is on the roadmap. The same Metal GPU pipeline that powers MetalRT on macOS maps directly to iOS (same Apple Silicon, 
same Metal API)</p>
]]></description><pubDate>Wed, 11 Mar 2026 04:56:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331865</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331865</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331865</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Agreed for a lot of use cases. RCLI supports text-only mode (--no-speak flag or just type in the TUI instead of using push-to-talk). TTS makes sense for hands-free / eyes-free scenarios, but we dont force it.</p>
]]></description><pubDate>Wed, 11 Mar 2026 03:27:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331448</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331448</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331448</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>We use AI tools in our workflow, same as a lot of teams at this point. The pipeline architecture, Metal integration, and engine design are ours. The code is MIT and open for anyone to read and judge the quality directly.</p>
]]></description><pubDate>Wed, 11 Mar 2026 03:25:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331440</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331440</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331440</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>RCLI includes local RAG out of the box. You can ingest PDFs, DOCX, and plain text, then query by voice or text:<p>rcli rag ingest ~/Documents/notes
rcli ask --rag ~/Library/RCLI/index "summarize the project plan"<p>It uses hybrid retrieval (vector + BM25 with Reciprocal Rank Fusion) and runs at ~4ms over 5K+ chunks. Embeddings are computed locally with Snowflake Arctic, so nothing leaves you're machine.</p>
]]></description><pubDate>Wed, 11 Mar 2026 03:23:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331430</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331430</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331430</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Fair point. The install script shouldn't silently install Homebrew without explicit consent. We'll update it to detect when Homebrew is missing and prompt the user before installing anything beyond RCLI itself.<p>In the meantime, if you already have Homebrew, you can install directly:<p>brew tap RunanywhereAI/rcli <a href="https://github.com/RunanywhereAI/RCLI.git" rel="nofollow">https://github.com/RunanywhereAI/RCLI.git</a>
brew install rcli
rcli setup<p>Or build from source if you prefer not to use either method: <a href="https://github.com/RunanywhereAI/RCLI#build-from-source" rel="nofollow">https://github.com/RunanywhereAI/RCLI#build-from-source</a></p>
]]></description><pubDate>Wed, 11 Mar 2026 03:22:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331423</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331423</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331423</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Cool, just checked out dlgo. Looks like you're targeting Go bindings for on-device inference? Different approach but same conviction that this should run locally. Happy to compare notes if you want to chat about Metal optimization or pipeline architecture.</p>
]]></description><pubDate>Wed, 11 Mar 2026 02:08:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331063</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331063</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331063</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Apple has the silicon, the frameworks (MLX, CoreML), and the models. The gap is putting it all together into a fast, unified on-device pipeline. That's what we're focused on, and honestly, we think Apple will eventually ship something similar natively. Until then, we're trying to show whats possible today on their hardware.</p>
]]></description><pubDate>Wed, 11 Mar 2026 02:07:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331055</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331055</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331055</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Absolutely, we'd welcome a Portfile contribution. Happy to review and merge. If halostatue wants to co-maintain, even better.<p>Feel free to open a PR or issue on the RCLI repo and we'll coordinate.</p>
]]></description><pubDate>Wed, 11 Mar 2026 02:06:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331044</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331044</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331044</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Understood, you want dictation, not a chatbot. That's a valid and different use case.<p>RCLI is Apple Silicon only today because MetalRT is built on Metal. For Linux, the closest thing to what you're describing would be building a virtual input device on top of Whisper or Parakeet (which RCLI supports as STT backends). Parakeet TDT 0.6B has ~1.9% WER, that's very close to production dictation quality.<p>The missing piece on Linux isn't the model, it's the integration: a daemon that captures mic audio, runs STT with hidden latency (streaming partial results), and injects text as keyboard input. sherpa-onnx (<a href="https://github.com/k2-fsa/sherpa-onnx" rel="nofollow">https://github.com/k2-fsa/sherpa-onnx</a>) supports Linux and has streaming STT, it might be the best starting point for what your after.<p>We're focused on Apple Silicon for now but broader platform support is on the roadmap.</p>
]]></description><pubDate>Wed, 11 Mar 2026 02:04:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331039</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331039</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331039</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>This is a great idea. A virtual audio device that sits in the path of any audio stream and provides live transcription, that would be huge for video conferencing, lectures, podcasts.<p>MetalRT's STT numbers make this feasible: 70 seconds of audio transcribed in 101ms means you could process audio chunks in real-time with massive headroom. The latency would be imperceptible.<p>We haven't built this yet but it's a compelling use case. CoreAudio supports virtual audio devices (aggregate devices) that could pipe audio through the pipeline. If anyone in this thread has experience building macOS audio HAL plugins and wants to collaborate, we're very open to contributions, RCLI is MIT.</p>
]]></description><pubDate>Wed, 11 Mar 2026 02:03:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331030</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331030</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331030</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>This is exactly the problem we're trying to solve. The models themselves have gotten surprisingly capable at small sizes, Qwen3.5 4B with 262K context, LFM2 1.2B for fast tool calling, but the inference infrastructure hasn't kept up.<p>When people say "local AI is too slow," they usually mean the engine is too slow, not the model. A 4B model at 186 tok/s (MetalRT on M4 Max) feels genuinely responsive for interactive chat. The same model at 87 tok/s (llama.cpp) feels sluggish. Same weights, same quality, 2x the speed, that's a usability cliff.<p>We think the gap between cloud and on-device inference is a infrastructure problem, not a model problem. That's what we're working on.</p>
]]></description><pubDate>Wed, 11 Mar 2026 02:02:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331022</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331022</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331022</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>The default TTS voice (Piper) is a lightweight model optimized for speed over quality. It's fast but yeah, it doesn't sound great.<p>If you install Kokoro TTS (rcli models > TTS section), the voice quality is dramatically better, it's a neural TTS model with 28 different voices. MetalRT synthesizes Kokoro at 178ms for short responses, so you don't pay a speed penalty for the upgrade.<p>We should probably make Kokoro the default or atleast make the upgrade path more obvious in the first-run experience. Fair feedback.</p>
]]></description><pubDate>Wed, 11 Mar 2026 02:01:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331013</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331013</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331013</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Fair criticism. The action executed on the LLM side but didn't translate to the correct macOS action, the model hallucinated success instead of routing to the open_url tool.<p>This is a known limitation with small LLMs (0.6B-1.2B) doing tool calling. They sometimes confuse "I know what you want" with "I did it." Upgrading to a larger model improves tool-calling accuracy significantly.<p>We're also working on verification, having the pipeline confirm the action actually succeeded before reporting back. Thats a fair expectation and we should meet it.</p>
]]></description><pubDate>Wed, 11 Mar 2026 02:00:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331009</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331009</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331009</guid></item><item><title><![CDATA[New comment by sanchitmonga22 in "Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon"]]></title><description><![CDATA[
<p>Thanks for trying it and for filing the bug, we're looking into the homebrew install issue.<p>On unsloth quants: agreed, they're consistently better bit-for-bit. Adding broader quantization format support (including unsloth's approach) is on the roadmap. Right now MetalRT works with MLX 4-bit files and GGUF Q4_K_M, we want to expand that.<p>On the grounding issue ("navigate to google.com" not actually navigating): you're right, that's a gap. The "open_url" action exists but the LLM doesn't always route to it correctly, especially with compound commands. Small models (0.6B-1.2B) have limited tool-calling accuracy, upgrading to Qwen3.5 4B via rcli upgrade-llm helps significantly. We're also improving the action routing prompts.<p>Appreciate the detailed feedback, this is exactly what we need.</p>
]]></description><pubDate>Wed, 11 Mar 2026 01:59:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331005</link><dc:creator>sanchitmonga22</dc:creator><comments>https://news.ycombinator.com/item?id=47331005</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331005</guid></item></channel></rss>