Hacker News: dabockster

New comment by dabockster in "MAI-Code-1-Flash"

dabockster — Wed, 03 Jun 2026 21:46:21 +0000

Qwen HAS to be a part of the discussion here, even though Microsoft is a US based entity. Their 30b MoE models absolutely hit way above their weight when paired with the right harness program, and can be ran on "Costco gaming computer" specs when configured correctly in llama.cpp.

Sorry Trump Administration, but while the US has been downloading more ram by throwing data centers at everything and burning up everyone's power and water, China has come out with what's effectively a prototype edge compute capable AI model - regardless of how they built it. And arguably I can tokenmaxx on it just fine at around 30-40 tokens/sec.

And also, ASICs are on the way. Imagine one of those with a heavy hitting model (MoE or otherwise, Qwen or otherwise) installed in a PCIe slot at 10k+ tokens/sec and 75 watts max (maximum wattage deliverable by the PCIe slot alone) for $300-400 USD each.

https://taalas.com/the-path-to-ubiquitous-ai/

ASIC demo here: https://chatjimmy.ai/

Sorry/not sorry to rip this whole thing to shreds. But I'm sick and tired of these inefficient LLMs being produced that seemingly can only be offered by subscription from a data center, when I'm running a full AI stack right now (model and all) on my computer at home on a 750 watt max power supply. Microsoft really needs to get with the picture here and compete more with Qwen instead of just the US/EU entities.

Sincerely, your neighbor down in Tacoma. https://www.youtube.com/watch?v=V9jlo4Ht2YA&t=229s

New comment by dabockster in "Mozilla's opposition to Chrome's Prompt API"

dabockster — Thu, 30 Apr 2026 16:08:12 +0000

> There should be no ability to "verify" a browser, and anyone should be able to emulate any browser.

Hard disagree. The AI industry has absolutely shredded the various anti-scraping and anti-botting social contracts that were in place prior to the covid pandemic. Like it's now common knowledge that robots.txt isn't a hard requirement and can be avoided entirely, for example. They have absolutely turned the open web into a dark forest.

Having a browser session able to be verified as untampered and/or "trusted" is probably going to be a thing going forward. Sucks a ton, but we all did this to ourselves.

New comment by dabockster in "Apple discontinues the Mac Pro"

dabockster — Thu, 02 Apr 2026 23:20:32 +0000

Because they're stupid and only buys stuff that's "safe". Because nobody gets fired for buying IBM.

New comment by dabockster in "Apple discontinues the Mac Pro"

dabockster — Fri, 27 Mar 2026 15:15:25 +0000

Intel could position their cards as strong for certain workloads. They had AV1 support first in market, for example.

New comment by dabockster in "Apple discontinues the Mac Pro"

dabockster — Fri, 27 Mar 2026 15:13:45 +0000

Thunderbolt is really an unsung hero here. It is surprisingly nice to be able to move various components around my desk that would have otherwise sat in a huge tower hogging all the PCIe slots they can find.

New comment by dabockster in "Apple discontinues the Mac Pro"

dabockster — Fri, 27 Mar 2026 15:10:45 +0000

You don’t need it if you use llamacpp on Windows, or if you compile it on Linux with CUDA 13 and the correct kernel HMM support, and you’re only using MoE models (which, tbh, you should be doing anyways).

New comment by dabockster in "Apple discontinues the Mac Pro"

dabockster — Fri, 27 Mar 2026 15:08:43 +0000

This needs to be sold as the big ticket item for low level devs. Their chips are some of the most power efficient chips on the market right now.

Hoping they release a blade server version somehow.

New comment by dabockster in "Apple discontinues the Mac Pro"

dabockster — Fri, 27 Mar 2026 15:07:07 +0000

CUDA 13 on Linux solves the unified memory problem via HMM and llamacpp. It’s an absolute pain to get running without disabling Secure Boot, but that should be remedied literally next month with the release of Ubuntu 26.04 LTS. Canonical is incorporating signed versions of both the new Nvidia open driver and CUDA into its own repo system, so look out for that. Signed Nvidia modules do already exist right now for RHEL and AlmaLinux, but those aren’t exactly the best desktop OSes.

But yeah, right now Apple actually has price <-> performance captured a lot of you’re buying a new computer just in general.

New comment by dabockster in "Wine 11 rewrites how Linux runs Windows games at kernel with massive speed gains"

dabockster — Thu, 26 Mar 2026 01:23:31 +0000

> For example, when my Windows gaming machine comes out of hibernation my ethernet controller insists that there's no connection. I can't convince it otherwise except by disabling the device and re-enabling it. I can't figure out where I might find information that tells me why this is happening, so I just wrote a powershell script to turn it off and then on again. I bet some Windows IT dork could figure it out in 30 seconds

Windows and Linux dork here (heh). It has to do with how various computer manufacturers implemented the Sleep/Standby State (S3/S4), how they've resisted implementing a common standard at the hardware level, and how Microsoft eventually gave up arguing and patched around it with their own Modern Standby system in the S0 state.

https://learn.microsoft.com/en-us/windows-hardware/design/de...

Tbh, though, the only computer I've ever seen Hibernate work well on are Macs. Every x86 computer usually has some sort of issue with it, except for maybe business laptop models (eg HP's Elitebook line).

New comment by dabockster in "Apple Business"

dabockster — Wed, 25 Mar 2026 23:30:54 +0000

"If Apple Business were a real revenue source, if they charged luxury prices for a luxurious business support experience, they could pay for developers to fix their stuff. Instead, Apple Business is a free side hustle for Apple, a hobby."

I'm wrestling with something similar to this right now in Linux. The only real player that charges "enough" to have a "absolutely zero tolerance for base OS breakage" approach to OS development is Red Hat. Ubuntu LTS is more widespread but only really because it's $0 even for large businesses, and that's honestly reflected in it sometimes having hardware breakage during a version's initial two year mainstream support run. Having Windows's business backed level of "doesn't break" on hardware is rare on Linux.

New comment by dabockster in "Apple Business"

dabockster — Wed, 25 Mar 2026 23:17:50 +0000

Bigger one:

* Predictability - eliminating the number of unknown factors that could cause a person to have issues using their computer. Reminds me of how a secretary I serviced was somehow able to install Google Desktop back in the day, and how that caused a massive argument between my boss and theirs when their computer needed to be re-imaged. Most IT approved programs are known to store user data in known locations on a computer, which makes backups and restorations very easy. Stuff like Google Desktop did not do that, which means likely breaking someone's workflow in the re-image process.

Tell HN: Llamacpp now supports unified system RAM offloading on Linux

dabockster — Tue, 24 Mar 2026 23:16:34 +0000

I'm a big fan of on-device AI inference for a million reasons, especially its potential to significantly reduce or even potentially eliminate the need for massive AI data center projects in the United States. But so far, the only place I've gotten that to work was on Windows by abusing the living heck out of its GPU shared virtual memory management with llamacpp. And since Windows isn't exactly the best OS these days, I've been looking at alternatives.

Recent changes in the core llamacpp code, the Linux kernel, the new "open" Nvidia driver, and CUDA 13 have finally enabled similar behavior on supported Linux based operating systems. I've tested the compilation on two distros and have confirmed that Unified/Heterogeneous Memory Management is finally working!

https://github.com/ggml-org/llama.cpp

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

A note of caution though: Nvidia's setup guide does not account for Secure Boot. Ubuntu and RHEL have pre-signed drivers in their repos. Otherwise you have to remember to run mokutils before rebooting. And for those of you who don't care if a software supply chain runs through a sewer, there's stuff like rpmfusion and the AUR of course. Another thing to note is that Ubuntu 26.04 LTS, releasing next month April 2026, is supposed to have an even easier way of installing a full CUDA dev environment in Linux. I'm really eager to try it out then.

But yeah, bottom line is you apparently can use unified memory functionality similar to MacOS on Linux now for AI inference. Combined with the right cli flags and sparse activated model (eg Qwen 3.5 35B A3B), AI is fully capable on something equivalent to what you'd see advertised as a "gaming" computer on the shelf at a Best Buy or Costco. Think RTX 3060, i5/Ryzen 5, 32gb ram (DDR4 or DDR5), 500-700 watt power supply, either air cooled or using a closed loop liquid cooler (no city water supply needed). And with llamacpp's built-in server (or your own code), you could arguably have your team's own private AI hub running on a box in a closet somewhere. And it's only going to get better from here. Who knows - we might go to CPU only soon with the right math work.

Sorry Sam. Sorry Elon. Sorry Mark. Sorry Dario and Daniela. You're all history. AI has been freed - figuratively, socially, and financially. Enjoy tokenmaxxing with your local on-device setups for $0/month, everyone!

---

I also want to point out that it's entirely coincidental that I found this out on the day of a major inference framework security breach (LiteLLM). Hope they pull through alright. As of this writing, I am unaware of any such issues in the llamacpp project.

Comments URL: https://news.ycombinator.com/item?id=47510953

Points: 6

# Comments: 0

New comment by dabockster in "Statement from Dario Amodei on our discussions with the Department of War"

dabockster — Thu, 26 Feb 2026 23:40:15 +0000

False. Local races directly determine the day-to-day laws and rules you live under way more than a POTUS could effectively decree. I don't know about you, but I sure enjoy having reliable electrical, water, and sewer systems.

New comment by dabockster in "Statement from Dario Amodei on our discussions with the Department of War"

dabockster — Thu, 26 Feb 2026 23:34:42 +0000

In the US, we have the ability to either confirm or change a significant chunk of our Federal government roughly every two years via the House of Representatives. The argument here is that we, theoretically, could collectively elect people that are hostile to domestic mass surveillance into the House of Representatives (and other places if able) and remove pro-surveillance incumbents from power on this two year cycle.

The reasons this hasn't happened yet are many and often vary by personal opinion. My top two are:

1) Lack of term limits across all Federal branches

and

2) A general lack of digital literacy across all Federal branches

I mean, if the people who are supposed to be regulating this stuff ask Mark Zuckerberg how to send an email, for example, then how the heck are they supposed to say no to the well dressed government contractor offering a magical black box computer solution to the fear of domestic terrorism (regardless of if its actually occurring or not)?

New comment by dabockster in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"

dabockster — Thu, 29 Jan 2026 01:47:15 +0000

> the Linux kernel and Linux distros both leave that unified memory capability up to the GPU driver to implement

Depends on if AMD (or Intel, since Arc drivers are supposedly OSS as well) took the time to implement that. Or if a Linux based OS/distro implements a Linux equivalent to the Windows Display Driver Model (needs code outside of the kernel and specific to the developed OS/distro to do).

So far, though, it seems like people are more interested in pointing fingers and sucking up the water of small town America than actually building efficient AI/graphics tech.

New comment by dabockster in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"

dabockster — Tue, 27 Jan 2026 19:34:58 +0000

> 'secure' (by executive standards)

"Secure" in the sense that they can sue someone after the fact, instead of preventing data from leaking in the first place.

New comment by dabockster in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"

dabockster — Tue, 27 Jan 2026 19:33:12 +0000

Hanlon's razor.

"Never attribute to malice that which is adequately explained by stupidity."

Yes, I'm calling labs that don't distill smaller sized models stupid for not doing so.

New comment by dabockster in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"

dabockster — Tue, 27 Jan 2026 19:02:27 +0000

You can run AI models on unified/shared memory specifically on Windows, not Linux (unfortunately). It uses the same memory sharing system that Microsoft originally had built for gaming when a game would run out of vram. If you:

- have an i5 or better or equivalent manufactured within the last 5-7 years

- have an nvidia consumer gaming GPU (RTX 3000 series or better) with at least 8 GB vram

- have at least 32 GB system ram (tested with DDR4 on my end)

- build llama-cpp yourself with every compiler optimization flag possible

- pair it with a MoE model compatible with your unified memory amount

- and configure MoE offload to the CPU to reduce memory pressure on the GPU

then you can honestly get to about 85-90% of cloud AI capability totally on-device, depending on what program you interface with the model.

And here's the shocking idea: those system specs can be met by an off the shelf gaming computer from, for example, Best Buy or Costco today and right now. You can literally buy a CyberPower or iBuyPower model, again for example, download the source, run the compilation, and have that level of AI inference available to you.

Now, the reason why it won't work on Linux is that the Linux kernel and Linux distros both leave that unified memory capability up to the GPU driver to implement. Which Nvidia hasn't done yet. You can code it somewhat into source code, but it's still super unstable and flaky from what I've read.

(In fact, that lack of unified memory tech on Linux is probably why everyone feels the need to build all these data centers everywhere.)

New comment by dabockster in "Apple picks Gemini to power Siri"

dabockster — Mon, 12 Jan 2026 17:03:28 +0000

It also lets them keep a lot of the legal issues regarding LLM development at arms length while still benefiting from them.

New comment by dabockster in "Meta is using the Linux scheduler designed for Valve's Steam Deck on its servers"

dabockster — Tue, 23 Dec 2025 19:30:10 +0000

It's Meta. They always push to be that fast on paper, even when it's costly to do and doesn't really need it.