Hacker News: p12tic

New comment by p12tic in "OpenAI closes funding round at an $852B valuation"

p12tic — Tue, 31 Mar 2026 21:10:21 +0000

> Only a matter of time for local models to reach Opus level. We are 1 or at most 2 years behind that and Anthropic knows that.

Can confirm. Kimi K2.5 is pretty intelligent and most of the time there's no difference between Opus and Kimi.

New comment by p12tic in "Britain today generating 90%+ of electricity from renewables"

p12tic — Sat, 28 Mar 2026 12:43:37 +0000

> In the last few years California runs 100% renewable on many days (and growing) every year.

How many is "many days"? Gas is still used for at least one fifth of electricity. https://app.electricitymaps.com/map/zone/US-CAL-CISO/5y/mont...

New comment by p12tic in "CES 2026: Taking the Lids Off AMD's Venice and MI400 SoCs"

p12tic — Tue, 06 Jan 2026 23:53:27 +0000

> not ... web/db servers, lightweight stuff like that.

They scale very well for web and db servers as well. You just put lots of containers/VMs on a single server.

AMD EPYC has a separate architecture specifically for such workloads. It's a bit weaker, runs at lower frequency and power and takes less silicon area. This way AMD can put more such cores on a single CPU (192 vs 128 for Zen 5c vs 5). So it's the other way round - web servers love high core count CPUs.

New comment by p12tic in "I regret building this $3000 Pi AI cluster"

p12tic — Fri, 19 Sep 2025 21:11:18 +0000

Depends on a server. This test got 79W idle for _two socket_ E5 2690-V4 server.

https://www.servethehome.com/lenovo-system-x3650-m5-workhors...

New comment by p12tic in "I regret building this $3000 Pi AI cluster"

p12tic — Fri, 19 Sep 2025 21:05:55 +0000

The problem is with the form factor, not the server hardware per-se. If one buys regular ATX motherboard that accepts server CPUs and fits it in regular ATX case, then there's lots of space for a relatively silent CPU air cooler. 2690 v4 idles at less than 40W which is not much more than a regular gaming desktop with a powerful GPU.

The only problem in practice is that server CPUs don't support S3 suspend, so putting whole thing to sleep after finishing with it doesn't work.

New comment by p12tic in "I regret building this $3000 Pi AI cluster"

p12tic — Fri, 19 Sep 2025 20:59:28 +0000

Better build a single workstation - less noise, less power usage and the form factor is way more convenient. A budget of $3000 can buy 128 cores with 512GB of RAM on a single regular EATX motherboard, a case, a power supply and other accessories. Power usage is ~550W at maximum utilization which not much more than a gaming rig with a powerful GPU.

New comment by p12tic in "%CPU utilization is a lie"

p12tic — Wed, 10 Sep 2025 19:34:38 +0000

> Today it's a bit more complicated when you have servers with 100+ cores as an option for under $30k (guestimate based on $10k CPU price).

If one can buy used, then previous generation 128C 256T epyc server is less than $5k. For homelabs that can accept non-rackmount gear it's less than $3k.

New comment by p12tic in "Overengineering my homelab so I don't pay cloud providers"

p12tic — Fri, 08 Aug 2025 20:18:55 +0000

That's just an artifact of Intel disabling ECC on consumer processors.

There's no reason for ECC to have significantly higher power consumption. It's just an additional memory chip per stick and a tiny bit of additional logic on CPU side to calculate ECC.

If power consumption is the target, ECC is not a problem. I know firsthand that even old Xeon D servers can hit 25W full system idle. On AMD side 4850G has 8 cores and can hit sub 25W full system idle as well.

New comment by p12tic in "Why DeepSeek is cheap at scale but expensive to run locally"

p12tic — Tue, 03 Jun 2025 20:36:23 +0000

We both agree. Batch size 1 is only relevant to people who want to run models on their own private machines. Which is the case of OP.

New comment by p12tic in "Why DeepSeek is cheap at scale but expensive to run locally"

p12tic — Sun, 01 Jun 2025 22:00:42 +0000

All of this is for batch size 1.

New comment by p12tic in "Why DeepSeek is cheap at scale but expensive to run locally"

p12tic — Sun, 01 Jun 2025 20:51:19 +0000

State of the art of local models is even further.

For example, look into https://github.com/kvcache-ai/ktransformers, which achieve >11 tokens/s on a relatively old two socket Xeon servers + retail RTX 4090 GPU. Even more interesting is prefill speed at more than 250 tokens/s. This is very useful in use cases like coding, where large prompts are common.

The above is achievable today. In the mean time Intel guys are working on something even more impressive. In https://github.com/sgl-project/sglang/pull/5150 they claim that they achieve >15 tokens/s generation and >350 tokens/s prefill. They don't share what exact hardware they run this on, but from various bits and pieces over various PRs I reverse-engineered that they use 2x Xeon 6980P with MRDIMM 8800 RAM, without GPU. Total cost of such setup will be around $10k once cheap Engineering samples hit eBay.

New comment by p12tic in "Negotiating PoE+ Power in the Pre‑Boot Environment"

p12tic — Wed, 28 May 2025 12:53:35 +0000

Incorrect. https://en.wikipedia.org/wiki/USB_hardware#USB_Power_Deliver... is a good start about the subject: "PD-aware devices implement a flexible power management scheme by interfacing with the power source through a bidirectional data channel and requesting a certain level of electrical power <...>".

New comment by p12tic in "The Llama 4 herd"

p12tic — Sat, 05 Apr 2025 19:18:59 +0000

For all intents and purposes cache may not exist when the working set is 17B or 109B parameters. So it's still better that less parameters are activated for each token. 17B parameters works ~6x faster than 109B parameters just because less data needs to be loaded from RAM.

New comment by p12tic in "The first yearly drop in average CPU performance in its 20 years of benchmarks"

p12tic — Wed, 12 Feb 2025 05:36:02 +0000

Most laptops are severely limited by heat dissipation. So it's normal that performance is much worse. The CPU cannot stay in turbo as long and must drop to lower frequencies sooner. On longer benchmarks they CPU starts throttling due to heat and becomes even slower.

New comment by p12tic in "WASM will replace containers"

p12tic — Wed, 12 Feb 2025 05:32:18 +0000

Container security boundary can be much stronger if one wants.

One can use something like https://github.com/google/gvisor as a container runtime for podman or docker. It's a good hybrid between VMs and containers. The container is put into sort of VM via kvm, but it does not supply a kernel and talks to a fake one. This means that security boundary is almost as strong as VM, but mostly everything will work like in a normal container.

E.g. here's I can read host filesystem even though uname says weird things about the kernel container is running in:

  $ sudo podman run -it --runtime=/usr/bin/runsc_wrap -v /:/app debian:bookworm  /bin/bash
  root@7862d7c432b4:/# ls /app
  bin   home            lib32       mnt   run   tmp      vmlinuz.old
  boot  initrd.img      lib64       opt   sbin  usr
  dev   initrd.img.old  lost+found  proc  srv   var
  etc   lib             media       root  sys   vmlinuz
  root@7862d7c432b4:/# uname -a
  Linux 7862d7c432b4 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 GNU/Linux

New comment by p12tic in "WASM will replace containers"

p12tic — Wed, 12 Feb 2025 05:28:50 +0000

E.g. here's I can read host filesystem even though uname says weird things about the kernel container is running in:

  $ sudo podman run -it --runtime=/usr/bin/runsc_wrap -v /:/app debian:bookworm  /bin/bash
  root@7862d7c432b4:/# ls /app
  bin   home            lib32       mnt   run   tmp      vmlinuz.old
  boot  initrd.img      lib64       opt   sbin  usr
  dev   initrd.img.old  lost+found  proc  srv   var
  etc   lib             media       root  sys   vmlinuz
  root@7862d7c432b4:/# uname -a
  Linux 7862d7c432b4 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 GNU/Linux

Gvisor let's one have strong sandbox without resorting to WASM.

New comment by p12tic in "Migrating 1 terabyte of files from OneDrive to Nextcloud"

p12tic — Fri, 18 Nov 2022 21:11:38 +0000

Seems like that's true: https://learn.microsoft.com/en-us/sharepoint/redirect-known-...

New comment by p12tic in "Linux: What Can You Epoll?"

p12tic — Sat, 22 Oct 2022 22:26:20 +0000

It's complicated, memory accesses can really block for relatively long periods of time.

Consider that regular memory access via cache takes around 1 nanosecond.

If the data is not in top-level cache, then we're looking at roughly 10 nanoseconds access latency.

If the data is not in cache at all, we are looking into 50-150 nanoseconds access latency.

If the data is in memory, but that memory is attached to another CPU socket, it's even more latency.

Finally, if the data access is via atomic instruction and there are many other CPUs accessing the same memory location, then the latency can be as high as 3000 nanoseconds.

It's not very hard to find NVMe attached storage that has latencies of tens of microseconds, which is not very far off memory access speeds.

New comment by p12tic in "Thoughts on Intel software-defined silicon"

p12tic — Sun, 20 Feb 2022 12:55:03 +0000

That would be a significant downgrade. Threadripper CPUs top out at 64 cores / 128 threads.

New comment by p12tic in "How to pick a good monitor for software development"

p12tic — Mon, 07 Feb 2022 09:28:15 +0000

I agree that just buying the most expensive monitor is waste of resources. On the other hand, I think price shouldn't even come into the picture when deciding on basic things, such as the number of monitors or whether to choose 4K monitor or just FHD.

I guess the most reasonable approach is to select a set of models that pass the requirements and only then think about the price.

Some requirements are really expensive though. If one insists on OLED and wants to fit 3 4K monitors side by side the full setup cost is around $10k.