<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mongrelion</title><link>https://news.ycombinator.com/user?id=mongrelion</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 16 Jun 2026 16:37:04 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mongrelion" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mongrelion in "The RAM shortage could last years"]]></title><description><![CDATA[
<p><i>letting the market set prices ensures that the chips go to the critical markets and uses.</i><p>Can you please elaborate what you mean by "critical market"?<p>Edit: formatting</p>
]]></description><pubDate>Mon, 20 Apr 2026 10:28:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47832379</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47832379</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47832379</guid></item><item><title><![CDATA[New comment by mongrelion in "The local LLM ecosystem doesn’t need Ollama"]]></title><description><![CDATA[
<p>llama.cpp moves too quickly to be added as a stable package. Instead, you can get it directly from AUR: <a href="https://aur.archlinux.org/packages?O=0&K=llama.cpp" rel="nofollow">https://aur.archlinux.org/packages?O=0&K=llama.cpp</a><p>There are packages for Vulkan, ROCm and CUDA. They all work.</p>
]]></description><pubDate>Thu, 16 Apr 2026 08:06:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47790096</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47790096</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47790096</guid></item><item><title><![CDATA[New comment by mongrelion in "Reallocating $100/Month Claude Code Spend to Zed and OpenRouter"]]></title><description><![CDATA[
<p>I have been so far happy with the value that Copilot brought but for the past few weeks I have felt the chokehold on the number of requests.<p>I have had the chance to test the main Chinese models through OpenRouter but the Pay-as-you-go model is expensive compared to a subscription model, but I don't want to marry to a single provider.<p>Thanks for bringing OpenCode Go to my attention. Your comparison is the research I didn't know I needed, and I will be cancelling my Copilot subscription to replace it with OpenCode Go right away.</p>
]]></description><pubDate>Thu, 09 Apr 2026 21:51:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47710667</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47710667</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47710667</guid></item><item><title><![CDATA[New comment by mongrelion in "Nvim-treesitter (13K+ Stars) is Archived"]]></title><description><![CDATA[
<p>It's clear to me that the maintainer is referring to "shushtain" and those type of people<p>> when they take that tone with you.<p>This makes it sound as if you took it personally?</p>
]]></description><pubDate>Sun, 05 Apr 2026 11:00:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47648155</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47648155</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47648155</guid></item><item><title><![CDATA[New comment by mongrelion in "Nvim-treesitter (13K+ Stars) is Archived"]]></title><description><![CDATA[
<p>Having a bad day does not entitle you to take it out on others</p>
]]></description><pubDate>Sun, 05 Apr 2026 10:56:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47648135</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47648135</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47648135</guid></item><item><title><![CDATA[New comment by mongrelion in "Nvim-treesitter (13K+ Stars) is Archived"]]></title><description><![CDATA[
<p>You should totally post this on the original thread just for adjustment :-)</p>
]]></description><pubDate>Sun, 05 Apr 2026 10:55:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47648131</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47648131</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47648131</guid></item><item><title><![CDATA[New comment by mongrelion in "$500 GPU outperforms Claude Sonnet on coding benchmarks"]]></title><description><![CDATA[
<p>I am definitely looking forward to TurboQuant. Makes me feel like my current setup is an investment that could pay over time. Imagine being able to run models like MiniMax M2.5 locally at Q4 levels. That would be swell.</p>
]]></description><pubDate>Fri, 27 Mar 2026 17:56:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47546047</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47546047</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47546047</guid></item><item><title><![CDATA[New comment by mongrelion in "$500 GPU outperforms Claude Sonnet on coding benchmarks"]]></title><description><![CDATA[
<p>Not the answer that you are looking for, but I am a fellow AMD GPU owner, so I want to share my experience.<p>I have a 9070 XT, which has 16GB of VRAM.
My understanding from reading around a bunch of forums is that the smallest quant you want to go with is Q4. Below that, the compression starts hurting the results quite a lot, especially for agentic coding. The model might eventually start missing brackets, quotes, etc.<p>I tried various AI + VRAM calculators but nothing was as on the point as Huggingface's built-in functionality. You simply sign up and configure in the settings [1] which GPU you have, so that when you visit a model page, you immediately see which of the quants fits in your card.<p>From the open source models out there, Qwen3.5 is the best right now. unsloth produces nice quants for it and even provides guidelines [2] on how to run them locally.<p>The 6-bit version of Qwen3.5 9B would fit nicely in your 6700 XT, but at 9B parameters, it probably isn't as smart as you would expect it to run.<p>Which model have you tried locally? Also, out of curiosity, what is your host configuration?<p>[1]: <a href="https://huggingface.co/settings/local-apps" rel="nofollow">https://huggingface.co/settings/local-apps</a>
[2]: <a href="https://unsloth.ai/docs/models/qwen3.5">https://unsloth.ai/docs/models/qwen3.5</a></p>
]]></description><pubDate>Fri, 27 Mar 2026 11:14:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47541277</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47541277</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47541277</guid></item><item><title><![CDATA[New comment by mongrelion in "$500 GPU outperforms Claude Sonnet on coding benchmarks"]]></title><description><![CDATA[
<p>What is this 10€ per month subscription that you are talking about?</p>
]]></description><pubDate>Fri, 27 Mar 2026 10:26:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47540924</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47540924</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47540924</guid></item><item><title><![CDATA[New comment by mongrelion in "Can I run AI locally?"]]></title><description><![CDATA[
<p>I don't understand why I'm getting downvoted.<p>I am legitimately curious about the parameters that the person used for running the model locally to get the results they got because I am myself currently experimenting with running models locally myself. You can see I am asking similar questions to others in this same thread and correlate the timestamps.</p>
]]></description><pubDate>Sat, 14 Mar 2026 21:34:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47381448</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47381448</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47381448</guid></item><item><title><![CDATA[New comment by mongrelion in "Can I run AI locally?"]]></title><description><![CDATA[
<p>At what temperature did you run it and what was your context limit?</p>
]]></description><pubDate>Sat, 14 Mar 2026 10:41:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47375286</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47375286</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47375286</guid></item><item><title><![CDATA[New comment by mongrelion in "Can I run AI locally?"]]></title><description><![CDATA[
<p>Apparently there is a whole science behind running models. I have seen the instructions that unsloth publishes for their quants and depending on the model they'll tweak things like the temperature, top k, etc.<p>The size of the quantization you chose also makes a difference.<p>The GPU driver also plays an important role.<p>What was your approach? What software did you use to run the models?</p>
]]></description><pubDate>Fri, 13 Mar 2026 21:38:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47370304</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47370304</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47370304</guid></item><item><title><![CDATA[New comment by mongrelion in "Can I run AI locally?"]]></title><description><![CDATA[
<p>What front-end framework did you use? I find the UI so visually appealing</p>
]]></description><pubDate>Fri, 13 Mar 2026 21:31:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47370234</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47370234</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47370234</guid></item><item><title><![CDATA[New comment by mongrelion in "Can I run AI locally?"]]></title><description><![CDATA[
<p>Which quantization are you running and what context size? 32tok/s for that model on that card sounds pretty good to me!</p>
]]></description><pubDate>Fri, 13 Mar 2026 21:27:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47370161</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47370161</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47370161</guid></item><item><title><![CDATA[New comment by mongrelion in "Can I run AI locally?"]]></title><description><![CDATA[
<p>It might be that the system prompt sent by codex is not optimal for that model. Try with open code and see if your results improve</p>
]]></description><pubDate>Fri, 13 Mar 2026 21:23:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47370118</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47370118</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47370118</guid></item><item><title><![CDATA[New comment by mongrelion in "How to run Qwen 3.5 locally"]]></title><description><![CDATA[
<p>By anyone do you mean a well-established business or any entity willing to serve you?</p>
]]></description><pubDate>Mon, 09 Mar 2026 18:56:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47313675</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47313675</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47313675</guid></item><item><title><![CDATA[New comment by mongrelion in "Something is afoot in the land of Qwen"]]></title><description><![CDATA[
<p>> [...] _but not necessarily use the right format._<p>This has also been my experience. But isn't the harness sending the instructions on how to invoke a tool? Maybe it is missing the formatting part. What do you think?</p>
]]></description><pubDate>Thu, 05 Mar 2026 18:48:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47265585</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47265585</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47265585</guid></item><item><title><![CDATA[New comment by mongrelion in "Ask HN: What Online LLM / Chat do you use?"]]></title><description><![CDATA[
<p>Through my Kagi subscription I get access to quite a few models [1] but I tend to rely on Qwen3 (fast) for quick questions and Qwen3 (reasoning) when I want a more structured approach, for example, when I am researching a topic.<p>I have tried the same approach with Kimi K2.5 and GLM 5 but I keep going back fo Qwen3.<p>I also have access to Perplexity which is quite decent to be honest, but I prefer to keep everything in Kagi.<p>1: <a href="https://help.kagi.com/kagi/ai/assistant.html#available-llms" rel="nofollow">https://help.kagi.com/kagi/ai/assistant.html#available-llms</a></p>
]]></description><pubDate>Tue, 03 Mar 2026 08:21:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47229691</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47229691</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47229691</guid></item><item><title><![CDATA[New comment by mongrelion in "Right-sizes LLM models to your system's RAM, CPU, and GPU"]]></title><description><![CDATA[
<p>Great idea of inferbench (similar to geekbench, etc.) but as of the time of writing, it's got only 83 submissions, which is underwhelming.</p>
]]></description><pubDate>Tue, 03 Mar 2026 08:10:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=47229596</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47229596</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47229596</guid></item><item><title><![CDATA[New comment by mongrelion in "Right-sizes LLM models to your system's RAM, CPU, and GPU"]]></title><description><![CDATA[
<p>> [...] it's much easier to fine-tune a "general" model into performing some very specific custom task (like classifying text, or translation, etc)<p>Is this fine-tunning process similar to training models? As in, do you need exhaustive resources? Or can this be done (realistically) on a consumer-grade GPU?</p>
]]></description><pubDate>Tue, 03 Mar 2026 08:08:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47229586</link><dc:creator>mongrelion</dc:creator><comments>https://news.ycombinator.com/item?id=47229586</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47229586</guid></item></channel></rss>