Hacker News: aazo11

New comment by aazo11 in "The Accuracy of On-Device LLMs"

aazo11 — Wed, 21 May 2025 16:39:43 +0000

I tested on-device LMs (Gemma, DeepSeek) across prompt cleanup, PII redaction, math, and general knowledge on my M2 Max laptop using LM Studio + DSPy.

Some observations

- Gemma-3 is the best model for on-device inference - 1B models look fine at first but break under benchmarking - 4B can handle simple rewriting and PII redaction. It also did math reasoning surprisingly well. - General knowledge Q&A does not work with a local model. This might work with a RAG pipeline or additional tools

I plan on training and fine-tuning 1B models to see if I can build high accuracy task specific models under 1GB in the future.

The Accuracy of On-Device LLMs

aazo11 — Wed, 21 May 2025 16:39:43 +0000

Article URL: https://medium.com/@aazo11/on-the-accuracy-of-on-device-llms-34fd6cc420b5

Comments URL: https://news.ycombinator.com/item?id=44053300

Points: 2

# Comments: 2

New comment by aazo11 in "AI's Version of Moore's Law"

aazo11 — Tue, 29 Apr 2025 16:56:55 +0000

The trend is that the length of tasks AI can do is doubling every 7 months.

Accompanying YT video https://www.youtube.com/watch?v=evSFeqTZdqs

AI's Version of Moore's Law

aazo11 — Tue, 29 Apr 2025 16:56:55 +0000

Article URL: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

Comments URL: https://news.ycombinator.com/item?id=43835146

Points: 2

# Comments: 1

New comment by aazo11 in "Lossless LLM compression for efficient GPU inference via dynamic-length float"

aazo11 — Fri, 25 Apr 2025 21:58:04 +0000

This is a huge unlock for on-device inference. The download time of larger models makes local inference unusable for non-technical users.

New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"

aazo11 — Mon, 21 Apr 2025 21:20:24 +0000

A better solution would train/finetune the smaller model from the responses of the larger model and only push to the inference to the edge if the smaller model is performant and the hardware specs can handle the workload?

New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"

aazo11 — Mon, 21 Apr 2025 20:47:05 +0000

Thanks for calling that out. It was 32GB. I updated the post as well.

New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"

aazo11 — Mon, 21 Apr 2025 20:10:40 +0000

Very interesting. I had not thought about gaming at all but that makes a lot of sense.

I also agree the goal should not be to replace ChatGPT. I think ChatGPT is way overkill for a lot of the workloads it is handling. A good solution should probably use the cloud LLM outputs to train a smaller model to deploy in the background.

New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"

aazo11 — Mon, 21 Apr 2025 20:04:53 +0000

They look awesome. Will try it out.

New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"

aazo11 — Mon, 21 Apr 2025 20:01:23 +0000

Exactly. Why does this not exist yet?

New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"

aazo11 — Mon, 21 Apr 2025 20:00:33 +0000

By "too hard" I do not mean getting started with them to run inference on a prompt. Ollama especially makes that quite easy. But as an application developer, I feel these platforms are too hard to build around. The main issues being: getting the correct small enough task specific model and how long it takes to download these models for the end user.

New comment by aazo11 in "Local LLM inference – impressive but too hard to work with"

aazo11 — Mon, 21 Apr 2025 16:42:52 +0000

I spent a couple of weeks trying out local inference solutions for a project. Wrote up my thoughts with some performance benchmarks in a blog.

TLDR -- What these frameworks can do on off the shelf laptops is astounding. However, it is very difficult to find and deploy a task specific model and the models themselves (even with quantization) are so large the download would kill UX for most applications.

Local LLM inference – impressive but too hard to work with

aazo11 — Mon, 21 Apr 2025 16:42:52 +0000

Article URL: https://medium.com/@aazo11/local-llm-inference-897a06cc17a2

Comments URL: https://news.ycombinator.com/item?id=43753890

Points: 84

# Comments: 58

New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"

aazo11 — Sun, 22 Dec 2024 19:02:37 +0000

Great question! The purpose of github-assistant is to showcase the technologies that make it easy to build a tool/feature like this, not necessarily for it to be a stand-alone service. With dlt/Relta/LangGraph/assistant-ui we spin this up in about 10 days. For example:

- The GitHub graphql API limits to 100 items to be queried at a time and has pretty opaque secondary rate limits. Building this with cURL would take effort. dlt handles all this complexity to set up a robust pipeline by providing a connector to the GitHub API. - Creating semantic layers manually from a relational dataset and leveraging it in a text-to-sql pipeline to prevent hallucinations (similar to those we highlighted in our Medium post) would take lots of manual effort, which Relta streamlines. - Creating a chat front-end with charts was made easy by assistant-ui

Hope this makes sense.

New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"

aazo11 — Sun, 22 Dec 2024 17:00:50 +0000

Yes in the future. We share the source code in both commercial and non-commercial engagements already. Drop me a line at amir [at] relta.dev if interested.

New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"

aazo11 — Sun, 22 Dec 2024 12:42:47 +0000

There will new data from the graphql API added over time. Would love your feedback on which data you like to see added https://docs.github.com/en/graphql

New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"

aazo11 — Sun, 22 Dec 2024 12:30:18 +0000

No this currently only answers questions from the GitHub graphql API.

New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"

aazo11 — Sun, 22 Dec 2024 12:29:21 +0000

We pull data from the GitHub API which includes data that that is not available from GitHub.com pages. Currently only PR, Issues, Commit and Star data is being loaded. You can also read more here https://medium.com/relta/github-assistant-49ae388ad758

New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"

aazo11 — Sun, 22 Dec 2024 04:56:19 +0000

Was able to reproduce and pushed an update. Thanks for calling this out.

New comment by aazo11 in "Show HN: GitHub-assistant – Natural language questions from your GitHub data"

aazo11 — Sun, 22 Dec 2024 01:34:15 +0000

Hi -- strange that didn't work. Overall, the semantic layer is designed to provide very tight guardrails and not hallucinate. You can see the agent suggest changes to the semantic layer if you give the produced answer a thumbs down.

The idea is for the system to provide answers that have close to 100% accuracy, but make it a single click for developers to to improve the semantic layer.