<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: suprjami</title><link>https://news.ycombinator.com/user?id=suprjami</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 12 Apr 2026 09:10:09 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=suprjami" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by suprjami in "I've been waiting over a month for Anthropic to respond to my billing issue"]]></title><description><![CDATA[
<p>"Anthropic CEO Says AI Could Replace Software Engineers in 6 to 12 Months"<p><a href="https://www.entrepreneur.com/business-news/ai-ceo-says-software-engineers-could-be-replaced-in-months/502087" rel="nofollow">https://www.entrepreneur.com/business-news/ai-ceo-says-softw...</a></p>
]]></description><pubDate>Thu, 09 Apr 2026 09:00:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47701010</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47701010</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47701010</guid></item><item><title><![CDATA[New comment by suprjami in "Git commands I run before reading any code"]]></title><description><![CDATA[
<p>Nice to see a fellow Tornhill fan. I loved his early C articles.</p>
]]></description><pubDate>Wed, 08 Apr 2026 23:11:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47697374</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47697374</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47697374</guid></item><item><title><![CDATA[New comment by suprjami in "The Australian government has announced gambling advertising reforms"]]></title><description><![CDATA[
<p>I would but the rest of Australia wouldn't. This country has an unhealthy drinking problem.</p>
]]></description><pubDate>Sat, 04 Apr 2026 21:27:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47643605</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47643605</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47643605</guid></item><item><title><![CDATA[New comment by suprjami in "Lemonade by AMD: a fast and open source local LLM server using GPU and NPU"]]></title><description><![CDATA[
<p>Yes, AMD themselves even use Vulkan tg numbers in their marketing material, because it's faster than ROCm on everything RDNA2 onwards (seems embarrassing).<p>However for pp, Vulkan is still nowhere near close to ROCm. That matters for long context and/or quick response. A lot of people really care about that time-to-first-token.</p>
]]></description><pubDate>Fri, 03 Apr 2026 03:06:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47622796</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47622796</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47622796</guid></item><item><title><![CDATA[New comment by suprjami in "Google releases Gemma 4 open models"]]></title><description><![CDATA[
<p>Following the current rule of thumb MoE = `sqrt(param*active)` a 200B-A3B would have the intelligence of a ~24B dense model.<p>That seems pointless. You can achieve that with a single 24G graphics card already.<p>I wonder if it would even hold up at that level, as 3B active is really not a lot to work with. Qwen 3.5 uses 122B-A10B and still is neck and neck with the 27B dense model.<p>I don't see any value proposition for these little boxes like DGX Spark and Strix Halo. Lots of too-slow RAM to do anything useful except run mergekit. imo you'd have been better building a desktop computer with two 3090s.</p>
]]></description><pubDate>Thu, 02 Apr 2026 21:42:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=47620571</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47620571</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47620571</guid></item><item><title><![CDATA[New comment by suprjami in "The Australian government has announced gambling advertising reforms"]]></title><description><![CDATA[
<p>I'd strongly support a year-based ban on cigarette purchases.<p>Set the purchase birth year to the current age 18. So DOB 2008 if done today, if you're born 2009 or later you can't buy smokes at all ever.<p>Within two generations we'd largely eliminate smoking. Within three cigarettes would be amongst impossible to get. Great public health initiative.</p>
]]></description><pubDate>Thu, 02 Apr 2026 21:01:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47620140</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47620140</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47620140</guid></item><item><title><![CDATA[New comment by suprjami in "Lemonade by AMD: a fast and open source local LLM server using GPU and NPU"]]></title><description><![CDATA[
<p>You have got to be joking.<p>My three NVIDIA cards are more power efficient than my one AMD card, both at idle and during usage.<p>Official ROCm is like pulling teeth with poor support for desktop cards. Debian, a volunteer led project, have better ROCm CI than AMD and support more cards.<p>Look at any benchmarks. NV midrange cards are faster than AMD and at least a generation in front. Owning a 7900XTX is an embarrassing disappointment.<p>I like AMD and want them to succeed, but they are way behind NV in this area.</p>
]]></description><pubDate>Thu, 02 Apr 2026 20:50:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47619989</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47619989</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47619989</guid></item><item><title><![CDATA[New comment by suprjami in "From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problem"]]></title><description><![CDATA[
<p>Some models really suffer badly from KV quantisation. You can also take a speed hit using dissimilar K and V types.<p>TurboQuant seems to be the next big thing in context memory usage. Polar coordinates achieving ~5x reduction in memory usage with minimal/no quality loss, and even a slight speedup in some cases.</p>
]]></description><pubDate>Tue, 31 Mar 2026 19:47:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47592474</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47592474</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47592474</guid></item><item><title><![CDATA[New comment by suprjami in "Quantization from the Ground Up"]]></title><description><![CDATA[
<p>Dual 3060s run 24B Q6 and 32B Q4 at ~15 tok/sec. That's fast enough to be usable.<p>Add a third one and you can run Qwen 3.5 27B Q6 with 128k ctx. For less than the price of a 3090.</p>
]]></description><pubDate>Thu, 26 Mar 2026 11:17:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47529095</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47529095</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47529095</guid></item><item><title><![CDATA[New comment by suprjami in "Plasma Bigscreen – 10-foot interface for KDE plasma"]]></title><description><![CDATA[
<p>Look up "USB RF remote" on eBay. There are two common ones you'll see everywhere. I have one for my Kodi system.</p>
]]></description><pubDate>Sat, 07 Mar 2026 09:58:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47286146</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47286146</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47286146</guid></item><item><title><![CDATA[New comment by suprjami in "Anthropic, please make a new Slack"]]></title><description><![CDATA[
<p>Please anyone make a new Slack. 4Gb RAM for a slow chat client with a bad interface is just so slovenly it should be illegal.</p>
]]></description><pubDate>Fri, 06 Mar 2026 23:44:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47282617</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47282617</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47282617</guid></item><item><title><![CDATA[New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"]]></title><description><![CDATA[
<p>Yes that's right. The config is described by the developer here:<p><a href="https://www.reddit.com/r/LocalLLaMA/comments/1rhohqk/comment/o8070gg/" rel="nofollow">https://www.reddit.com/r/LocalLLaMA/comments/1rhohqk/comment...</a><p>And is in the sample config too:<p><a href="https://github.com/mostlygeek/llama-swap/blob/main/config.example.yaml" rel="nofollow">https://github.com/mostlygeek/llama-swap/blob/main/config.ex...</a><p>iiuc MLX quants are not GGUFs for llama.cpp. They are a different file format which you use with the MLX inference server. LM Studio abstracts all that away so you can just pick an MLX quant and it does all the hard work for you. I don't have a Mac so I have not looked into this in detail.</p>
]]></description><pubDate>Mon, 02 Mar 2026 08:40:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47215345</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47215345</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47215345</guid></item><item><title><![CDATA[New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"]]></title><description><![CDATA[
<p>Shouldn't you be using MLX because it's optimised for Apple Silicon?<p>Many user benchmarks report up to 30% better memory usage and up to 50% higher token generation speed:<p><a href="https://reddit.com/r/LocalLLaMA/comments/1fz6z79/lm_studio_ships_an_mlx_backend_run_any_llm_from/" rel="nofollow">https://reddit.com/r/LocalLLaMA/comments/1fz6z79/lm_studio_s...</a><p>As the post says, LM Studio has an MLX backend which makes it easy to use.<p>If you still want to stick with llama-server and GGUF, look at llama-swap which allows you to run one frontend which provides a list of models and dynamically starts a llama-server process with the right model:<p><a href="https://github.com/mostlygeek/llama-swap" rel="nofollow">https://github.com/mostlygeek/llama-swap</a><p>(actually you could run any OpenAI-compatible server process with llama-swap)</p>
]]></description><pubDate>Mon, 02 Mar 2026 04:59:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47214054</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47214054</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47214054</guid></item><item><title><![CDATA[New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"]]></title><description><![CDATA[
<p>Ah thanks.<p>The names are so good and not repetitious.<p>No not the RTX 6000. No not the A6000...</p>
]]></description><pubDate>Sun, 01 Mar 2026 13:17:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47206445</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47206445</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47206445</guid></item><item><title><![CDATA[New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"]]></title><description><![CDATA[
<p>Unsloth Dynamic. Don't bother with anything else.</p>
]]></description><pubDate>Sat, 28 Feb 2026 22:11:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47200839</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47200839</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47200839</guid></item><item><title><![CDATA[New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"]]></title><description><![CDATA[
<p>The cheapest option is two 3060 12G cards. You'll be able to fit the Q4 of the 27B or 35B with an okay context window.<p>If you want to spend twice as much for more speed, get a 3090/4090/5090.<p>If you want long context, get two of them.<p>If you have enough spare cash to buy a car, get an RTX Ada with 96G VRAM.</p>
]]></description><pubDate>Sat, 28 Feb 2026 22:01:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=47200765</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47200765</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47200765</guid></item><item><title><![CDATA[New comment by suprjami in "Tell HN: YC companies scrape GitHub activity, send spam emails to users"]]></title><description><![CDATA[
<p>Big deal, so does every other company.<p>If you're lonely just upload a few AI keywords to a repo. You'll get emails forever.</p>
]]></description><pubDate>Thu, 26 Feb 2026 20:20:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=47171498</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47171498</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47171498</guid></item><item><title><![CDATA[New comment by suprjami in "Terence Tao, at 8 years old (1984) [pdf]"]]></title><description><![CDATA[
<p>At 8 years old I was able to expertly dismantle many radios.<p>Was still a few years away from reassembly.</p>
]]></description><pubDate>Tue, 24 Feb 2026 05:59:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47133374</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47133374</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47133374</guid></item><item><title><![CDATA[New comment by suprjami in "'Peanut butter' pay raises could cost companies their top performers"]]></title><description><![CDATA[
<p>Next step is to skip the bread and eat Nutella from the jar with a spoon.</p>
]]></description><pubDate>Sun, 22 Feb 2026 20:39:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47114418</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47114418</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47114418</guid></item><item><title><![CDATA[New comment by suprjami in "Claws are now a new layer on top of LLM agents"]]></title><description><![CDATA[
<p>It feels to me there are plenty of people running these because "just trust the AI bro" who are one hallucination away from having their entire bank account emptied.</p>
]]></description><pubDate>Sat, 21 Feb 2026 21:20:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47104895</link><dc:creator>suprjami</dc:creator><comments>https://news.ycombinator.com/item?id=47104895</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47104895</guid></item></channel></rss>