Hacker News: suprjami

New comment by suprjami in "I've been waiting over a month for Anthropic to respond to my billing issue"

suprjami — Thu, 09 Apr 2026 09:00:51 +0000

"Anthropic CEO Says AI Could Replace Software Engineers in 6 to 12 Months"

https://www.entrepreneur.com/business-news/ai-ceo-says-softw...

New comment by suprjami in "Git commands I run before reading any code"

suprjami — Wed, 08 Apr 2026 23:11:11 +0000

Nice to see a fellow Tornhill fan. I loved his early C articles.

New comment by suprjami in "The Australian government has announced gambling advertising reforms"

suprjami — Sat, 04 Apr 2026 21:27:11 +0000

I would but the rest of Australia wouldn't. This country has an unhealthy drinking problem.

New comment by suprjami in "Lemonade by AMD: a fast and open source local LLM server using GPU and NPU"

suprjami — Fri, 03 Apr 2026 03:06:59 +0000

Yes, AMD themselves even use Vulkan tg numbers in their marketing material, because it's faster than ROCm on everything RDNA2 onwards (seems embarrassing).

However for pp, Vulkan is still nowhere near close to ROCm. That matters for long context and/or quick response. A lot of people really care about that time-to-first-token.

New comment by suprjami in "Google releases Gemma 4 open models"

suprjami — Thu, 02 Apr 2026 21:42:17 +0000

Following the current rule of thumb MoE = `sqrt(param*active)` a 200B-A3B would have the intelligence of a ~24B dense model.

That seems pointless. You can achieve that with a single 24G graphics card already.

I wonder if it would even hold up at that level, as 3B active is really not a lot to work with. Qwen 3.5 uses 122B-A10B and still is neck and neck with the 27B dense model.

I don't see any value proposition for these little boxes like DGX Spark and Strix Halo. Lots of too-slow RAM to do anything useful except run mergekit. imo you'd have been better building a desktop computer with two 3090s.

New comment by suprjami in "The Australian government has announced gambling advertising reforms"

suprjami — Thu, 02 Apr 2026 21:01:52 +0000

I'd strongly support a year-based ban on cigarette purchases.

Set the purchase birth year to the current age 18. So DOB 2008 if done today, if you're born 2009 or later you can't buy smokes at all ever.

Within two generations we'd largely eliminate smoking. Within three cigarettes would be amongst impossible to get. Great public health initiative.

New comment by suprjami in "Lemonade by AMD: a fast and open source local LLM server using GPU and NPU"

suprjami — Thu, 02 Apr 2026 20:50:04 +0000

You have got to be joking.

My three NVIDIA cards are more power efficient than my one AMD card, both at idle and during usage.

Official ROCm is like pulling teeth with poor support for desktop cards. Debian, a volunteer led project, have better ROCm CI than AMD and support more cards.

Look at any benchmarks. NV midrange cards are faster than AMD and at least a generation in front. Owning a 7900XTX is an embarrassing disappointment.

I like AMD and want them to succeed, but they are way behind NV in this area.

New comment by suprjami in "From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problem"

suprjami — Tue, 31 Mar 2026 19:47:26 +0000

Some models really suffer badly from KV quantisation. You can also take a speed hit using dissimilar K and V types.

TurboQuant seems to be the next big thing in context memory usage. Polar coordinates achieving ~5x reduction in memory usage with minimal/no quality loss, and even a slight speedup in some cases.

New comment by suprjami in "Quantization from the Ground Up"

suprjami — Thu, 26 Mar 2026 11:17:27 +0000

Dual 3060s run 24B Q6 and 32B Q4 at ~15 tok/sec. That's fast enough to be usable.

Add a third one and you can run Qwen 3.5 27B Q6 with 128k ctx. For less than the price of a 3090.

New comment by suprjami in "Plasma Bigscreen – 10-foot interface for KDE plasma"

suprjami — Sat, 07 Mar 2026 09:58:35 +0000

Look up "USB RF remote" on eBay. There are two common ones you'll see everywhere. I have one for my Kodi system.

New comment by suprjami in "Anthropic, please make a new Slack"

suprjami — Fri, 06 Mar 2026 23:44:19 +0000

Please anyone make a new Slack. 4Gb RAM for a slow chat client with a bad interface is just so slovenly it should be illegal.

New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"

suprjami — Mon, 02 Mar 2026 08:40:51 +0000

Yes that's right. The config is described by the developer here:

https://www.reddit.com/r/LocalLLaMA/comments/1rhohqk/comment...

And is in the sample config too:

https://github.com/mostlygeek/llama-swap/blob/main/config.ex...

iiuc MLX quants are not GGUFs for llama.cpp. They are a different file format which you use with the MLX inference server. LM Studio abstracts all that away so you can just pick an MLX quant and it does all the hard work for you. I don't have a Mac so I have not looked into this in detail.

New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"

suprjami — Mon, 02 Mar 2026 04:59:03 +0000

Shouldn't you be using MLX because it's optimised for Apple Silicon?

Many user benchmarks report up to 30% better memory usage and up to 50% higher token generation speed:

https://reddit.com/r/LocalLLaMA/comments/1fz6z79/lm_studio_s...

As the post says, LM Studio has an MLX backend which makes it easy to use.

If you still want to stick with llama-server and GGUF, look at llama-swap which allows you to run one frontend which provides a list of models and dynamically starts a llama-server process with the right model:

https://github.com/mostlygeek/llama-swap

(actually you could run any OpenAI-compatible server process with llama-swap)

New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"

suprjami — Sun, 01 Mar 2026 13:17:22 +0000

Ah thanks.

The names are so good and not repetitious.

No not the RTX 6000. No not the A6000...

New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"

suprjami — Sat, 28 Feb 2026 22:11:35 +0000

Unsloth Dynamic. Don't bother with anything else.

New comment by suprjami in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"

suprjami — Sat, 28 Feb 2026 22:01:50 +0000

The cheapest option is two 3060 12G cards. You'll be able to fit the Q4 of the 27B or 35B with an okay context window.

If you want to spend twice as much for more speed, get a 3090/4090/5090.

If you want long context, get two of them.

If you have enough spare cash to buy a car, get an RTX Ada with 96G VRAM.

New comment by suprjami in "Tell HN: YC companies scrape GitHub activity, send spam emails to users"

suprjami — Thu, 26 Feb 2026 20:20:17 +0000

Big deal, so does every other company.

If you're lonely just upload a few AI keywords to a repo. You'll get emails forever.

New comment by suprjami in "Terence Tao, at 8 years old (1984) [pdf]"

suprjami — Tue, 24 Feb 2026 05:59:51 +0000

At 8 years old I was able to expertly dismantle many radios.

Was still a few years away from reassembly.

New comment by suprjami in "'Peanut butter' pay raises could cost companies their top performers"

suprjami — Sun, 22 Feb 2026 20:39:24 +0000

Next step is to skip the bread and eat Nutella from the jar with a spoon.

New comment by suprjami in "Claws are now a new layer on top of LLM agents"

suprjami — Sat, 21 Feb 2026 21:20:27 +0000

It feels to me there are plenty of people running these because "just trust the AI bro" who are one hallucination away from having their entire bank account emptied.