Hacker News: porkloin

New comment by porkloin in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"

porkloin — Mon, 15 Jun 2026 18:09:42 +0000

I have good results with this setup:

Hardware:

- GPU: AMD 7900xtx, 24gb vram

- CPU: AMD 5950x, AM4

- RAM: 64gb DDR4 3600

Software:

- OS: Bazzite (atomic fedora - this machine is running Steam "big picture" mode on my TV when not in use for LLM tasks)

- Virtualization: Podman Quadlets, which allows me to run container images as managed systemd units

- Network: tailscale

- Inference: llama.cpp vulkan (better performance than ROCM, though I'm keeping an eye on it in the future)

- LLM API surface: llama-swap (running as a podman quadlet exposed via tailscale svc) allows running multiple models on a single endpoint.

- Web/Chat Access: open-webui (running as podman quadlet exposed via tailscale svc) allows me to access any of the models I'm using for coding harness access for chat/general purpose queries via web browser. I also have the "conduit" app for my iPhone that allows me to hit the same models from my phone.

Models:

- Qwen3.6-27B-MTP-UD-Q4_K_XL.gguf - Unsloth Q4 quant of the qwen 3.6 27B model weights, with MTP enabled. MTP is important as it improves the speed the model can run at.

- Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf - Unsloth Q4 quant of 35B-A3B. Not MTP right now because I was having some issues with it?

- gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf - Gemma 4, which I use sometimes via open-webui instead of Qwen, but I generally think Qwen does a better job

Flags (specific for Qwen 27b, since that's primary model):

- `-ngl 99` offload all layers to GPU

- `-c 80000` 80K context window. I'd like this to be higher, but since my GPU also has to run the desktop session for the machine, I need to leave some VRAM overhead to keep the desktop from OOM-ing

- `-np 1` single slot (no parallel request handling)

- `--no-context-shift` error instead of silently sliding the context window when full

- `--cache-reuse 256` reuse cached prefix in chunks of 256 tokens (prompt cache)

- `-b 2048` logical batch size (tokens per submission)

- `-ub 1024` physical micro-batch (per GPU pass)

- `--cache-type-k q8_0 --cache-type-v q8_0` symmetric 8-bit K/V cache. Q8 is as low as I've been able to go without getting some issues with tool calling

- `-fa on` flash attention

- `--spec-type draft-mtp` use the model's built-in MTP as the draft model

- `--spec-draft-n-max 3` propose up to 3 draft tokens per step

- `--spec-draft-n-min 0` allow zero drafts if confidence is low

- `--spec-draft-type-k q8_0 --spec-draft-type-v q8_0` KV quant for the draft path

- `--reasoning-format deepseek` parse blocks in proper format

- `--chat-template-kwargs '{"enable_thinking": true}'` turns on Qwen's thinking mode on by default (clients can override)

- `--jinja` use the GGUF's Jinja chat template

- `--temp 0.6` moderate randomness (Qwen recommended value for coding)

- `--top-p 0.95` nucleus sampling (Qwen recommended value for coding)

- `--top-k 20` top-20 candidates (Qwen recommended value for coding)

- `--min-p 0.0 disabled (Qwen recommended value for coding)

Performance (27b, primary model):

- ~65t/s for token generation

- ~600 t/s for prompt processing.

- If these numbers don't mean much to you, perceptually this feels about on-par with cloud model speed, maybe slightly faster.

- ~30s cold start when swapping from a different model or starting up session from idle via llama-swap.

I have llama-swap set up to unload the model after 10 min of idle, because I sometimes use this machine for gaming as well. A little annoying, but a small price to pay to be able to use the machine for other stuff (gaming) when I'm not using it with coding tasks.

CLI/Harness:

- Crush harness (https://github.com/charmbracelet/crush) less feature rich than Claude Code, but with a smaller system prompt and better built-in LSP support. I point it at the tailnet DNS (https://llama.:)

- Headroom (https://github.com/chopratejas/headroom) to maximize the 80k context window

- Exa MCP for web search (https://exa.ai/) this alone makes the model far more useable. It's shocking how often the official claude code or codex harness get botblocked on web fetches, and the results of a good web fetch can be the difference between a good turn and a bad turn.

A lot of people get hung up on whether Qwen 3.x models are "as smart as" some parallel Anthropic model. Most people seem to agree it's somewhere between Haiku 4.5 and Sonnet 4.5. Personally, I think the biggest thing that makes the Qwen 3.x series of models _feel_ good to use for coding workflows is that its the first time that tool calling actually works consistently on local models. If tool calling is busted even 5% of the time, it can totally ruin the flow. I think that's also why people tend to say the "harness is more important than the model" or whatever. I have a few other models set up but 27B with MTP is the best compromise of speed and quality that I've found.

This setup works well enough for me that I dropped my personal Claude Code subscription. At work I'm still using frontier models, but personally I don't feel like I need that much power for anything I work on in my personal life. I'm "lucky" that I made the random financially unwise choice to buy a 7900XTX in late 2022 for $1k as a gaming card. I had no clue it would actually be a pretty decent LLM card 3-4 years later.

Edit: sorry for the horrible formatting, I always forget that HN doesn't actually do markdown :(

New comment by porkloin in "What happens if Japan takes in zero immigrants?"

porkloin — Fri, 05 Jun 2026 07:22:48 +0000

I think a lot of places were at various points in time, right? It's always a little easier to paint with a broad brush and lump hundreds of years into a single statement when you're talking about the history of "foreign" places that we don't learn the history of very deeply in western education. For Japan in particular, it's hard because a big part of the Meiji nationalist movements was to recast Japanese history with a heavy focus on the "bushido" and an arguably manufactured version of some points in the country's history that _were_ undoutably bloody. That yarn-spinning from 100+ years ago has actively shaped how the rest of the world thinks about Japanese history. Western governments during WW2 were happy to take that narrative and paint the entire history as blood-soaked and brutal to dehumanize their enemies. But it's not hard to find evidence of how many long stretches of internal peace existed in Japan. After the establishment of the Tokugawa shogunate there was 250 years of more or less continuous peace internally. Arguably continental Europe from the middle-ages onward is more fractured and bloody in total than Japan was in the same period. Classical Greece was a zillion times more bloody.

New comment by porkloin in "What happens if Japan takes in zero immigrants?"

porkloin — Fri, 05 Jun 2026 05:14:58 +0000

The idea that Japan is a uniquely "homogeneous culture" is honestly a modern construct anyway. Japanese culture and language has been enormously influenced by colonial and migrant presence in the country, from Chinese to Dutch to British to American, and a zillion others.

Just look at the language! I don't have the exact figure in front of me, but I remember when taking Japanese language courses that something like 30% of the lexicon is loanwords from other languages (edit: I looked it up and it's apparently closer to 50%) Way higher than most other widely spoken languages on the planet. Japanese culture is legitimately _amazing_ in its capacity to absorb and domesticate outside influence, and it's unfortunate that people both in the country and abroad are so short-sighted to not see that.

The Meiji and Showa era militarism benefited a lot by promoting this myth. They weren't alone, mind you. Lots of folks across the EU and the US are still falling for the same nationalist stories that their governments cooked up in the early 1900s to drive them all to war.

The country _does_ have a really notable cohesion and shared identity, but the problem is in attributing that to some kind of unique isolationism rather than their long history of pluralism.

New comment by porkloin in "I Cancelled Claude: Token Issues, Declining Quality, and Poor Support"

porkloin — Fri, 24 Apr 2026 18:11:48 +0000

assuming you have a locally running llama-server or llama-swap, just drop this into your crush.json with your setup details/local addresses etc:

Edit: i forgot HN doesn't do code fences. See https://pastebin.com/2rQg0r2L

Obviously the context window settings are going to depend on what you've got set on the llama-server/llama-swap side. Multiple models on the same server like I have in the config snippet above is mostly only relevant if you're using llama-swap.

TL;DR is you need to set up a provider for your local LLM server, then set at least one model on that server, then set the large and small models that crush actually uses to respond to prompts to use that provider/model combo. Pretty straightforward but agree that their docs could be better for local LLM setups in particular.

For me, I've got llama-swap running and set up on my tailnet as a [tailscale service](https://tailscale.com/docs/features/tailscale-services) so I'm able to use my local LLMs anywhere I would use a cloud-hosted one, and I just set the provider baseurl in crush.json to my tailscale service URL and it works great.

New comment by porkloin in "“Your frustration is the product”"

porkloin — Thu, 19 Mar 2026 22:37:46 +0000

I think Kagi is kind of making this happen currently with search. Not sure how their adoption number are going, but people are willing to pay $$ for better search with no "sponsored content" rising to the top.

I'm hesitant about a lot of this stuff because it's very easy to get to a place where we let net neutrality degrade even more than it already has. Part of the way that platforms indoctrinate us to accept that paying extra for quality of service or "fast lanes" for specific content types are "necessary" is to degrade the existing experience so much that it seems inevitable.

New comment by porkloin in "Qatar helium shutdown puts chip supply chain on a two-week clock"

porkloin — Fri, 13 Mar 2026 20:08:41 +0000

Honestly even in "developed countries" it's not worth blindly trusting that the power in your house/building is clean. It's cheap and easy enough to just put any expensive hardware on a UPS rather than speculating what's going on behind the walls.

New comment by porkloin in "AIs can generate near-verbatim copies of novels from training data"

porkloin — Mon, 23 Feb 2026 17:33:40 +0000

I think it's important because there are a bunch of would-be claimants for intellectual property violation. Many people speculate that their work was used in training data, but it can be difficult to produce sufficient proof that their copyrighted work is present in the training data. If you could reliably get an LLM to produce 70% of a copyrighted book that would probably be enough to get a few lawyers salivating.

I didn't read the source paper referenced in the ars technica piece, but this statement about it makes me wonder how useful it actually is:

> But a study published last month showed that researchers at Stanford and Yale Universities were able to strategically prompt LLMs from OpenAI, Google, Anthropic, and xAI to generate thousands of words from 13 books, including A Game of Thrones, The Hunger Games, and The Hobbit.

It seems like well-known books with tons of summary, adaptations into film scripts, and tons of writing about the book in the overall corpus make it way less surprising to see be partially reproducible.

So I guess that's a lot of words to say - yeah until there's something definitive that allows people to prompt LLMs into either unlawfully recreating an entire work verbatim or otherwise indisputably proving that a copyrighted work was used in training data, there's probably nothing game changing in it.

New comment by porkloin in "Loops is a federated, open-source TikTok"

porkloin — Mon, 23 Feb 2026 06:00:18 +0000

Clearly discord has more of a vested interest in boosting engagement - especially now that they are showing people "quests". What a quirky and fun way to say "ads"!

But at the same time I don't necessarily buy the idea that all of their reactions/roles/badges/etc are exclusively malevolent engagement-driven design decisions meant to hook people. I do think that some of them are legitimate improvements to chat communication, and as a result many of those features have proliferated across other messaging platforms. Hell, most of them didn't even originate at discord at all but were cribbed from their competitors.

To be clear, I 1000% agree with you that IRC is less addicting. Even just by simple merit of not having multi-device push notifications. Those pull me back into the app. But push notifications across devices are also just objectively useful. I name that one in particular because it's one of the biggest and most notable features that prevents me from returning to IRC, where I happily did most of my chat until the mid 2010s. I'm actively shopping for a discord alternative as a regular user who is fed up with discord's march toward enshittification, and matrix looks like it gives me most of that convenience without the worst parts of discord.

New comment by porkloin in "Software Survival 3.0"

porkloin — Fri, 30 Jan 2026 22:43:05 +0000

The guy is high on his own supply. This entire thing reads like a fever dream.

New comment by porkloin in "The dank case for scrolling window managers"

porkloin — Fri, 30 Jan 2026 08:59:40 +0000

1000% agree - you said everything better that I was trying to say in my comment. Likewise coming from conventional TWMs I had some of the same struggles initially but the whole thing is just so smooth and config is so stupidly easy to work with. The docs are amazing and the community seems pretty boring in a good way :)

New comment by porkloin in "The dank case for scrolling window managers"

porkloin — Fri, 30 Jan 2026 08:50:33 +0000

The majority of the projects in this comment chain don't actually independently implement a compositor in Rust - which is a good thing IMHO. Cosmic and Pinaccle at least come from a common core written in rust that is associated with the cosmic project: https://github.com/Smithay/smithay/

New comment by porkloin in "The dank case for scrolling window managers"

porkloin — Fri, 30 Jan 2026 08:48:27 +0000

Currently using Niri and DMS via https://github.com/zirconium-dev/zirconium which is fedora bootc atomic + niri + dms. After taking a year or so away from tiling WMs where I was using KDE for a bit, I'm enjoying it quite a lot.

Super impressed by the "out of the box" experience given that it took a ton of sweat and tears to get these types of setups 10+ years ago when I posting stupid screenshots of my awesomewm and bspwm configs to /r/unixporn.

I wasn't so sure about the scrolling wm thing but I'm enjoying not having to worry about switching layouts constantly to "make room" like I always have in traditional tiling wms. Dynamic virtual desktops has taken some getting used to since I was a long-term adherent of the "10 static virtual desktops" way of thinking, but again it's been a good experience to just get used to the idea that each virtual desktop isn't as limited as it is in other WMs since you can have some content off screen.

I think an underrated aspect of Niri is that it's a cousin to System76's cosmic desktop: they share a base compositor through https://github.com/Smithay/smithay/. I think a big part of why Niri has been able to pull off such a polished experience has a lot to do with smart design from folks working on Smithay.

New comment by porkloin in "Please don't say mean things about the AI I just invested a billion dollars in"

porkloin — Thu, 29 Jan 2026 00:47:19 +0000

For me I guess I don't really see what it's adding. You can watch an actual video clip of Jensen begging people not to "bully" or say "hurtful" things about AI while wearing a stupid leather jacket. It's a million times funnier to watch him squirm in real life.

I find it unfunny for the same reason I don't find modern SNL intro bits about Trump funny. The source material is already insane to the point that it makes surface-level satire like this feel pointless.

New comment by porkloin in "Please don't say mean things about the AI I just invested a billion dollars in"

porkloin — Thu, 29 Jan 2026 00:19:57 +0000

I hate LLMs as much as the next guy, but this was honestly just not very funny. Humor can be a great vehicle for criticism when it's done right, but this feels like clickbait-level lazy writing. I wouldn't criticize it anywhere else, but I have enjoyed reading a bunch of actually good writing from mcsweeney's over the years in the actual literary journal and on their website.

New comment by porkloin in "Fedora Asahi Remix is now working on Apple M3"

porkloin — Mon, 26 Jan 2026 19:15:34 +0000

Are you sure you've actually used the higher refresh rate? It might not be enabled by default. I'd be surprised if you can't tell the difference comparing 60hz to 120hz back to back.

New comment by porkloin in "Roam 50GB is now Roam 100GB"

porkloin — Wed, 14 Jan 2026 18:48:17 +0000

I work remotely and use a starlink mini for work and general internet usage since I road trip in the summer a lot. For work I'm not using doing RDP/remote desktop stuff since I have a company-issued laptop, but I have some experience using it to stream graphics-intensive games from my home PC with a nice GPU to my phone with a mobile controller attached to it.

I saw around 50-100ms of latency in ideal conditions with a clear view of the sky. There are distinct large latency spikes every 30ish minutes, which I think is due to the dish switching between different satellites.

I think the latency would be fine for working, but it will hardly be transparent. When using it to play games, I've mostly stuck to stuff that doesn't require fast responses or parry mechanics, etc.

Even without RDP-ing into another workstation, the latency spikes on video calls can be noticeable. Moment-to-moment video conferencing latency is totally fine, given that most of the major players in the space have pretty good latency compensation baked in.

A few details/complications:

- I'm usually within ~500 miles of my home, which is relevant because starlink satellites communicate with ground stations, and being closer to home will still have a meaningful impact on latency

- host PC is on a wired fiber connection

- I live relatively far north (~65N) and starlink's network isn't biased toward polar orbiting satellites, so my coverage probaby isn't representative of behavior further south. You can see a map of satellites and note the relatively poor arctic and subarctic region coverage here: https://satellitemap.space/

New comment by porkloin in "Ask HN: ADHD – How do you manage the constant stream of thoughts and ideas?"

porkloin — Wed, 14 Jan 2026 00:41:37 +0000

Yep. I've stopped using them after years of em dash-ing. It just wasn't worth continuing to use them once that became everyone's default "written by LLM" heuristic. I think now I understand the pain the boomers and gen-x went through during the "double space after period means you're an oldster" era.

New comment by porkloin in "Ask HN: ADHD – How do you manage the constant stream of thoughts and ideas?"

porkloin — Wed, 14 Jan 2026 00:31:17 +0000

I fucking hate responses like the one you're responding to. I'm someone who writes long comments, and I've always been that way. The last few years have been awful because it doesn't take long for someone who disagrees with me to just accuse my writing of being an LLM because it's longer than a few sentences and reasonably well written.

I don't ultimately really care if the person they were accusing of using an LLM was actually using an LLM to write it. But people who accuse any comment longer than a paragraph of being LLM content are asses.

New comment by porkloin in "Cowork: Claude Code for the rest of your work"

porkloin — Tue, 13 Jan 2026 00:59:49 +0000

Yes, and I think we're already seeing that in the general trend of recent linux work toward atomic updates. [bootc](https://developers.redhat.com/articles/2024/09/24/bootc-gett...) based images are getting a ton of traction. [universal blue](https://universal-blue.org/) is probably a better brochure example of how bootc can make systems more resilient without needing to move to declarative nix for the entire system like you do in NixOS. Every "upgrade" is a container deployment, and you can roll back or forward to new images at any time. Parts of the filesystem aren't writeable (which pisses people off who don't understand the benefit) but the advantages for security (isolating more stuff to user space by necessity) and stability (wedged upgrades are almost always recoverable) are totally worth it.

On the user side, I could easily see [systemd-homed](https://fedoramagazine.org/unlocking-the-future-of-user-mana...) evolving into a system that allows snapshotting/roll forward/roll back on encrypted backups of your home dir that can be mounted using systemd-homed to interface with the system for UID/GID etc.

These are just two projects that I happen to be interested in at the moment - there's a pretty big groundswell in Linux atm toward a model that resembles (and honestly even exceeds) what NixOS does in terms of recoverability on upgrade.

New comment by porkloin in "AI is a business model stress test"

porkloin — Sun, 11 Jan 2026 02:52:24 +0000

Yep - exactly. Ops isn't immune to LLMs stealing your customers. Given that most of the "open source product with premium hosting" models are just reselling hyperscaler compute at a huge markup, the customers are going to realize pretty quickly that they can use an LLM to setup some basic devops and get the same uptime. Most of these companies are offering a middleman service that becomes a bad deal the moment the customer has access to expertise they previously lacked.

I also think he's glossing over the fact that one of the reasons why companies choose to pay for "ops" to run their software for them is because it's built by amateurs or amateurs-playing-professional and runs like shit. I happen to know this first hand from years of working at a company selling hosting and ops for the exact same CMS that Dries' business hosts (Drupal, a PHP-based CMS) and the absolute garbage that some people are able to put together in frameworks like Wordpress and Drupal is truly astounding. I'm not even talking about the janky local businesses where their nephew who was handy with computers made them a Wordpress site - big multinational companies have sites in these frameworks that can barely handle 1x their normal traffic and more or less explode at 1.5x.

The business of hosting these customers' poorly optimized garbage remains a big business. But we're entering into an era where the people who produce poorly optimized software have a different path to take rather than throwing it to a SaaS platform that can through sheer force of will make their lead-weight airplane fly. They can spend orders of magnitude less money to pay an LLM to make the software actually just not run like shit in the first place. Throwing scaling at the problem of 99.95% is a blunt instrument that only works if the person paying doesn't have the time, money, or knowledge to do it themselves.

Companies like these (including the one I work for currently) are absolutely going to get squeezed from both directions. The ceiling is coming down as more realize they can do their own devops, and the floor is rising as customer code quality gets better. Eventually you have to try your best to be 3 ft tall instead of 6.