<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: aazo11</title><link>https://news.ycombinator.com/user?id=aazo11</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 05 May 2026 08:26:39 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=aazo11" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by aazo11 in "The Accuracy of On-Device LLMs"]]></title><description><![CDATA[
<p>I tested on-device LMs (Gemma, DeepSeek) across prompt cleanup, PII redaction, math, and general knowledge on my M2 Max laptop using LM Studio + DSPy.<p>Some observations<p>- Gemma-3 is the best model for on-device inference
- 1B models look fine at first but break under benchmarking
- 4B can handle simple rewriting and PII redaction. It also did math reasoning surprisingly well.
- General knowledge Q&A does not work with a local model. This might work with a RAG pipeline or additional tools<p>I plan on training and fine-tuning 1B models to see if I can build high accuracy task specific models under 1GB in the future.</p>
]]></description><pubDate>Wed, 21 May 2025 16:39:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=44053301</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=44053301</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44053301</guid></item><item><title><![CDATA[The Accuracy of On-Device LLMs]]></title><description><![CDATA[
<p>Article URL: <a href="https://medium.com/@aazo11/on-the-accuracy-of-on-device-llms-34fd6cc420b5">https://medium.com/@aazo11/on-the-accuracy-of-on-device-llms-34fd6cc420b5</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44053300">https://news.ycombinator.com/item?id=44053300</a></p>
<p>Points: 2</p>
<p># Comments: 2</p>
]]></description><pubDate>Wed, 21 May 2025 16:39:43 +0000</pubDate><link>https://medium.com/@aazo11/on-the-accuracy-of-on-device-llms-34fd6cc420b5</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=44053300</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44053300</guid></item><item><title><![CDATA[New comment by aazo11 in "AI's Version of Moore's Law"]]></title><description><![CDATA[
<p>The trend is that the length of tasks AI can do is doubling every 7 months.<p>Accompanying YT video <a href="https://www.youtube.com/watch?v=evSFeqTZdqs" rel="nofollow">https://www.youtube.com/watch?v=evSFeqTZdqs</a></p>
]]></description><pubDate>Tue, 29 Apr 2025 16:56:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=43835147</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43835147</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43835147</guid></item><item><title><![CDATA[AI's Version of Moore's Law]]></title><description><![CDATA[
<p>Article URL: <a href="https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/">https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43835146">https://news.ycombinator.com/item?id=43835146</a></p>
<p>Points: 2</p>
<p># Comments: 1</p>
]]></description><pubDate>Tue, 29 Apr 2025 16:56:55 +0000</pubDate><link>https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43835146</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43835146</guid></item><item><title><![CDATA[New comment by aazo11 in "Lossless LLM compression for efficient GPU inference via dynamic-length float"]]></title><description><![CDATA[
<p>This is a huge unlock for on-device inference. The download time of larger models makes local inference unusable for non-technical users.</p>
]]></description><pubDate>Fri, 25 Apr 2025 21:58:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=43798917</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43798917</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43798917</guid></item><item><title><![CDATA[New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"]]></title><description><![CDATA[
<p>A better solution would train/finetune the smaller model from the responses of the larger model and only push to the inference to the edge if the smaller model is performant and the hardware specs can handle the workload?</p>
]]></description><pubDate>Mon, 21 Apr 2025 21:20:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=43756622</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43756622</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43756622</guid></item><item><title><![CDATA[New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"]]></title><description><![CDATA[
<p>Thanks for calling that out. It was 32GB. I updated the post as well.</p>
]]></description><pubDate>Mon, 21 Apr 2025 20:47:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=43756286</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43756286</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43756286</guid></item><item><title><![CDATA[New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"]]></title><description><![CDATA[
<p>Very interesting. I had not thought about gaming at all but that makes a lot of sense.<p>I also agree the goal should not be to replace ChatGPT. I think ChatGPT is way overkill for a lot of the workloads it is handling. A good solution should probably use the cloud LLM outputs to train a smaller model to deploy in the background.</p>
]]></description><pubDate>Mon, 21 Apr 2025 20:10:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=43755923</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43755923</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43755923</guid></item><item><title><![CDATA[New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"]]></title><description><![CDATA[
<p>They look awesome. Will try it out.</p>
]]></description><pubDate>Mon, 21 Apr 2025 20:04:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=43755873</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43755873</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43755873</guid></item><item><title><![CDATA[New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"]]></title><description><![CDATA[
<p>Exactly. Why does this not exist yet?</p>
]]></description><pubDate>Mon, 21 Apr 2025 20:01:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=43755847</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43755847</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43755847</guid></item><item><title><![CDATA[New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"]]></title><description><![CDATA[
<p>By "too hard" I do not mean getting started with them to run inference on a prompt. Ollama especially makes that quite easy. But as an application developer, I feel these platforms are too hard to build around. The main issues being: getting the correct small enough task specific model and how long it takes to download these models for the end user.</p>
]]></description><pubDate>Mon, 21 Apr 2025 20:00:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=43755836</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43755836</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43755836</guid></item><item><title><![CDATA[New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"]]></title><description><![CDATA[
<p>I spent a couple of weeks trying out local inference solutions for a project. Wrote up my thoughts with some performance benchmarks in a blog.<p>TLDR -- What these frameworks can do on off the shelf laptops is astounding. However, it is very difficult to find and deploy a task specific model and the models themselves (even with quantization) are so large the download would kill UX for most applications.</p>
]]></description><pubDate>Mon, 21 Apr 2025 16:42:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=43753891</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43753891</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43753891</guid></item><item><title><![CDATA[Local LLM inference – impressive but too hard to work with]]></title><description><![CDATA[
<p>Article URL: <a href="https://medium.com/@aazo11/local-llm-inference-897a06cc17a2">https://medium.com/@aazo11/local-llm-inference-897a06cc17a2</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43753890">https://news.ycombinator.com/item?id=43753890</a></p>
<p>Points: 84</p>
<p># Comments: 58</p>
]]></description><pubDate>Mon, 21 Apr 2025 16:42:52 +0000</pubDate><link>https://medium.com/@aazo11/local-llm-inference-897a06cc17a2</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=43753890</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43753890</guid></item><item><title><![CDATA[New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"]]></title><description><![CDATA[
<p>Great question! The purpose of github-assistant is to showcase the technologies that make it easy to build a tool/feature like this, not necessarily for it to be a stand-alone service. With dlt/Relta/LangGraph/assistant-ui we spin this up in about 10 days. For example:<p>- The GitHub graphql API limits to 100 items to be queried at a time and has pretty opaque secondary rate limits. Building this with cURL would take effort. dlt handles all this complexity to set up a robust pipeline by providing a connector to the GitHub API.
- Creating semantic layers manually from a relational dataset and leveraging it in a text-to-sql pipeline to prevent hallucinations (similar to those we highlighted in our Medium post) would take lots of manual effort, which Relta streamlines.
- Creating a chat front-end with charts was made easy by assistant-ui<p>Hope this makes sense.</p>
]]></description><pubDate>Sun, 22 Dec 2024 19:02:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=42488320</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=42488320</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42488320</guid></item><item><title><![CDATA[New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"]]></title><description><![CDATA[
<p>Yes in the future. We share the source code in both commercial and non-commercial engagements already. Drop me a line at amir [at] relta.dev if interested.</p>
]]></description><pubDate>Sun, 22 Dec 2024 17:00:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=42487566</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=42487566</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42487566</guid></item><item><title><![CDATA[New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"]]></title><description><![CDATA[
<p>There will new data from the graphql API added over time. Would love your feedback on which data you like to see added <a href="https://docs.github.com/en/graphql" rel="nofollow">https://docs.github.com/en/graphql</a></p>
]]></description><pubDate>Sun, 22 Dec 2024 12:42:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=42485980</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=42485980</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42485980</guid></item><item><title><![CDATA[New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"]]></title><description><![CDATA[
<p>No this currently only answers questions from the GitHub graphql API.</p>
]]></description><pubDate>Sun, 22 Dec 2024 12:30:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=42485942</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=42485942</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42485942</guid></item><item><title><![CDATA[New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"]]></title><description><![CDATA[
<p>We pull data from the GitHub API which includes data that that is not available from GitHub.com pages. Currently only PR, Issues, Commit and Star data is being loaded. You can also read more here <a href="https://medium.com/relta/github-assistant-49ae388ad758" rel="nofollow">https://medium.com/relta/github-assistant-49ae388ad758</a></p>
]]></description><pubDate>Sun, 22 Dec 2024 12:29:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=42485937</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=42485937</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42485937</guid></item><item><title><![CDATA[New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"]]></title><description><![CDATA[
<p>Was able to reproduce and pushed an update. Thanks for calling this out.</p>
]]></description><pubDate>Sun, 22 Dec 2024 04:56:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=42484469</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=42484469</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42484469</guid></item><item><title><![CDATA[New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"]]></title><description><![CDATA[
<p>Hi -- strange that didn't work. Overall, the semantic layer is designed to provide very tight guardrails and not hallucinate. You can see the agent suggest changes to the semantic layer if you give the produced answer a thumbs down.<p>The idea is for the system to provide answers that have close to 100% accuracy, but make it a single click for developers to to improve the semantic layer.</p>
]]></description><pubDate>Sun, 22 Dec 2024 01:34:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=42483786</link><dc:creator>aazo11</dc:creator><comments>https://news.ycombinator.com/item?id=42483786</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42483786</guid></item></channel></rss>