Hacker News: mongrelion

New comment by mongrelion in "The RAM shortage could last years"

mongrelion — Mon, 20 Apr 2026 10:28:35 +0000

letting the market set prices ensures that the chips go to the critical markets and uses.

Can you please elaborate what you mean by "critical market"?

Edit: formatting

New comment by mongrelion in "The local LLM ecosystem doesn’t need Ollama"

mongrelion — Thu, 16 Apr 2026 08:06:57 +0000

llama.cpp moves too quickly to be added as a stable package. Instead, you can get it directly from AUR: https://aur.archlinux.org/packages?O=0&K=llama.cpp

There are packages for Vulkan, ROCm and CUDA. They all work.

New comment by mongrelion in "Reallocating $100/Month Claude Code Spend to Zed and OpenRouter"

mongrelion — Thu, 09 Apr 2026 21:51:38 +0000

I have been so far happy with the value that Copilot brought but for the past few weeks I have felt the chokehold on the number of requests.

I have had the chance to test the main Chinese models through OpenRouter but the Pay-as-you-go model is expensive compared to a subscription model, but I don't want to marry to a single provider.

Thanks for bringing OpenCode Go to my attention. Your comparison is the research I didn't know I needed, and I will be cancelling my Copilot subscription to replace it with OpenCode Go right away.

New comment by mongrelion in "Nvim-treesitter (13K+ Stars) is Archived"

mongrelion — Sun, 05 Apr 2026 11:00:25 +0000

It's clear to me that the maintainer is referring to "shushtain" and those type of people

> when they take that tone with you.

This makes it sound as if you took it personally?

New comment by mongrelion in "Nvim-treesitter (13K+ Stars) is Archived"

mongrelion — Sun, 05 Apr 2026 10:56:24 +0000

Having a bad day does not entitle you to take it out on others

New comment by mongrelion in "Nvim-treesitter (13K+ Stars) is Archived"

mongrelion — Sun, 05 Apr 2026 10:55:26 +0000

You should totally post this on the original thread just for adjustment :-)

New comment by mongrelion in "$500 GPU outperforms Claude Sonnet on coding benchmarks"

mongrelion — Fri, 27 Mar 2026 17:56:41 +0000

I am definitely looking forward to TurboQuant. Makes me feel like my current setup is an investment that could pay over time. Imagine being able to run models like MiniMax M2.5 locally at Q4 levels. That would be swell.

New comment by mongrelion in "$500 GPU outperforms Claude Sonnet on coding benchmarks"

mongrelion — Fri, 27 Mar 2026 11:14:31 +0000

Not the answer that you are looking for, but I am a fellow AMD GPU owner, so I want to share my experience.

I have a 9070 XT, which has 16GB of VRAM. My understanding from reading around a bunch of forums is that the smallest quant you want to go with is Q4. Below that, the compression starts hurting the results quite a lot, especially for agentic coding. The model might eventually start missing brackets, quotes, etc.

I tried various AI + VRAM calculators but nothing was as on the point as Huggingface's built-in functionality. You simply sign up and configure in the settings [1] which GPU you have, so that when you visit a model page, you immediately see which of the quants fits in your card.

From the open source models out there, Qwen3.5 is the best right now. unsloth produces nice quants for it and even provides guidelines [2] on how to run them locally.

The 6-bit version of Qwen3.5 9B would fit nicely in your 6700 XT, but at 9B parameters, it probably isn't as smart as you would expect it to run.

Which model have you tried locally? Also, out of curiosity, what is your host configuration?

[1]: https://huggingface.co/settings/local-apps [2]: https://unsloth.ai/docs/models/qwen3.5

New comment by mongrelion in "$500 GPU outperforms Claude Sonnet on coding benchmarks"

mongrelion — Fri, 27 Mar 2026 10:26:45 +0000

What is this 10€ per month subscription that you are talking about?

New comment by mongrelion in "Can I run AI locally?"

mongrelion — Sat, 14 Mar 2026 21:34:09 +0000

I don't understand why I'm getting downvoted.

I am legitimately curious about the parameters that the person used for running the model locally to get the results they got because I am myself currently experimenting with running models locally myself. You can see I am asking similar questions to others in this same thread and correlate the timestamps.

New comment by mongrelion in "Can I run AI locally?"

mongrelion — Sat, 14 Mar 2026 10:41:58 +0000

At what temperature did you run it and what was your context limit?

New comment by mongrelion in "Can I run AI locally?"

mongrelion — Fri, 13 Mar 2026 21:38:37 +0000

Apparently there is a whole science behind running models. I have seen the instructions that unsloth publishes for their quants and depending on the model they'll tweak things like the temperature, top k, etc.

The size of the quantization you chose also makes a difference.

The GPU driver also plays an important role.

What was your approach? What software did you use to run the models?

New comment by mongrelion in "Can I run AI locally?"

mongrelion — Fri, 13 Mar 2026 21:31:57 +0000

What front-end framework did you use? I find the UI so visually appealing

New comment by mongrelion in "Can I run AI locally?"

mongrelion — Fri, 13 Mar 2026 21:27:10 +0000

Which quantization are you running and what context size? 32tok/s for that model on that card sounds pretty good to me!

New comment by mongrelion in "Can I run AI locally?"

mongrelion — Fri, 13 Mar 2026 21:23:03 +0000

It might be that the system prompt sent by codex is not optimal for that model. Try with open code and see if your results improve

New comment by mongrelion in "How to run Qwen 3.5 locally"

mongrelion — Mon, 09 Mar 2026 18:56:26 +0000

By anyone do you mean a well-established business or any entity willing to serve you?

New comment by mongrelion in "Something is afoot in the land of Qwen"

mongrelion — Thu, 05 Mar 2026 18:48:12 +0000

> [...] _but not necessarily use the right format._

This has also been my experience. But isn't the harness sending the instructions on how to invoke a tool? Maybe it is missing the formatting part. What do you think?

New comment by mongrelion in "Ask HN: What Online LLM / Chat do you use?"

mongrelion — Tue, 03 Mar 2026 08:21:46 +0000

Through my Kagi subscription I get access to quite a few models [1] but I tend to rely on Qwen3 (fast) for quick questions and Qwen3 (reasoning) when I want a more structured approach, for example, when I am researching a topic.

I have tried the same approach with Kimi K2.5 and GLM 5 but I keep going back fo Qwen3.

I also have access to Perplexity which is quite decent to be honest, but I prefer to keep everything in Kagi.

1: https://help.kagi.com/kagi/ai/assistant.html#available-llms

New comment by mongrelion in "Right-sizes LLM models to your system's RAM, CPU, and GPU"

mongrelion — Tue, 03 Mar 2026 08:10:02 +0000

Great idea of inferbench (similar to geekbench, etc.) but as of the time of writing, it's got only 83 submissions, which is underwhelming.

New comment by mongrelion in "Right-sizes LLM models to your system's RAM, CPU, and GPU"

mongrelion — Tue, 03 Mar 2026 08:08:43 +0000

> [...] it's much easier to fine-tune a "general" model into performing some very specific custom task (like classifying text, or translation, etc)

Is this fine-tunning process similar to training models? As in, do you need exhaustive resources? Or can this be done (realistically) on a consumer-grade GPU?