Hacker News: ipieter

Canada's AI Startup Cohere Buys Germany's Aleph Alpha to Expand in Europe

ipieter — Fri, 24 Apr 2026 08:35:12 +0000

Article URL: https://www.reuters.com/legal/transactional/canadas-cohere-germanys-aleph-alpha-announce-merger-handelsblatt-reports-2026-04-24/

Comments URL: https://news.ycombinator.com/item?id=47887383

Points: 2

# Comments: 0

New comment by ipieter in "Claude Token Counter, now with model comparisons"

ipieter — Mon, 20 Apr 2026 06:04:35 +0000

There is currently very little evidence that morphological tokenizers help model performance [1]. For languages like German (where words get glued together) there is a bit more evidence (eg a paper I worked on [2]), but overall I start to suspect the bitter lesson is also true for tokenization.

[1] https://arxiv.org/pdf/2507.06378

[2] https://pieter.ai/bpe-knockout/

New comment by ipieter in "Why DeepSeek is cheap at scale but expensive to run locally"

ipieter — Mon, 02 Jun 2025 22:44:12 +0000

Distributing inference per layer, instead of splitting each layer across gpus, is indeed another approach, called pipeline parallelism. However, per batch there is less compute (only 1 gpu at a time), so inference is slower. In addition, the orchestration of starting the next batch on gpu #0 while gpu #1 starts is quite tricky. For this reason, tensor parallelism as I described is way more common in LLM inference.

New comment by ipieter in "Why DeepSeek is cheap at scale but expensive to run locally"

ipieter — Sun, 01 Jun 2025 12:41:19 +0000

This is an interesting blogpost. While the general conclusion ("We need batching") is true, inference of mixture of experts (MoE) models is actually a bit more nuanced.

The main reason we want big batches is because LLM inference is not limited by the compute, but my loading every single weight out of VRAM. Just compare the number of TFLOPS of an H100 with the memory bandwidth, there's basically room for 300 FLOP per byte loaded. So that's why we want big batches: we can perform a lot of operations per parameter/weight that we load from memory. This limit is often referred to as the "roofline model".

As models become bigger, this does not scale anymore because the model weights will not fit into GPU memory anymore and you need to distribute them across GPUs or across nodes. Even with NVLink and Infiniband, these communications are slower than loading from VRAM. NVlink is still fine for tensor parallelism, but across nodes this is quite slow.

So what MoE allows is expert parallelism, where different nodes keep different experts in memory and don't need to communicate as much between nodes. This only works if there are enough nodes to keep all experts in VRAM and have enough overhead for other stuff (KV cache, other weights, etc). So naturally the possible batch size becomes quite large. And of course you want to maximize this to make sure all GPUs are actually working.

New comment by ipieter in "Debugging my wife's alarm clock"

ipieter — Sun, 27 Oct 2024 15:02:16 +0000

The mains frequency is literally how fast the generators in power plants are turning. If the load on the grid increases, those generators slow down slightly and more natural gas/coal/heat needs to be added to increase the frequency again. This whole process is quite complicated as not every plant can react in the same time. Some plants are always at 100% capacity, while others are dedicated to governing the frequency.

So there are small fluctuations, often between 0.2 Hz around the base frequency, but the average is very close to the theoretical 50/60 Hz. And for an alarm clock that is good enough.

New comment by ipieter in "Set Up a $4/Mo Hetzner VM to Skip the Serverless Tax"

ipieter — Sun, 01 Sep 2024 18:29:59 +0000

I have both DO and Hetzner VMs and I find them comparable, with Hetzner being a bit cheaper. If I look at the logs and fail2ban, it looks like DO does a bit more abusive traffic filtering, but that is basically the only difference.

However, the DO docs are on a different level and high quality. But those are also accessible if you are not a customer.

New comment by ipieter in "Show HN: Interactive Graph by LLM (GPT-4o)"

ipieter — Sun, 19 May 2024 10:44:08 +0000

Yeah I just tried it with numbers I knew a bit and it seems totally made up. The generated chart showed a linear downwards trend while in reality there isn't one and the numbers seem way off.

New comment by ipieter in "How to Cite ChatGPT"

ipieter — Sun, 11 Jun 2023 15:09:33 +0000

Our university recently recommended the same thing, and I think this is a very bad idea for two reasons.

1. Not every sequence of words deserves to be cited. GPT-3 and ChatGPT are often confidently wrong about facts, why would you want to add a citation to this? When writing a paper, this needs to be fact-checked anyways, so why not add the original (actual) source?

2. It also breaks the citation graph. Imagine all papers now point to a catch-all reference from OpenAI (2023). Adding a citation is about saying where you got certain information from and this current format doesn't give enough to do that, it just points to the catch-all. With any other citation you can either look up the paper or—in the rare case of personal communication in a citation—ask the cited source directly. You can't ask chatGPT "hey, why did you say this in paper X" and expect a meaningful answer.

New comment by ipieter in "Abandoned Motorola Headquarters (2020)"

ipieter — Fri, 13 Aug 2021 20:56:35 +0000

no it’s not. Philips Lighting is split of into a new brand: Signify. They pay to use the Philips name (e.g. for Philips Hue).

New comment by ipieter in "My Hunt for the Original McDonald’s French-Fry Recipe"

ipieter — Mon, 30 Nov 2020 15:15:05 +0000

As a Belgian, I was looking in horror at that recipe. Why would people put sugar on their fries?!

So I'm glad you like the Belgian fries. What I've always learned to be key is that we fry our cut potatoes twice: once at 140°C, wait for them to cool down, and once at 175-180°C. Then you add salt, that's it.

And the mayo: I don't think many people like it like that when they eat fries at home, but it's easy ¯\_(ツ)_/¯

New comment by ipieter in "Show HN: Generate a static website from any back end"

ipieter — Sat, 02 May 2020 09:12:19 +0000

I can answer that, since I also made a small static site generator from scratch in python.

I was using another static site generator (Hexo), but at some point I wanted to change some things and add some custom html to my posts. Since the documentation was ... well ... minimalistic on some aspects, I also spent some time looking at the source code. But at that point I was really wondering what benefit I still had from using that generator.

In the end, all a static site generator does is collecting some markdown or RST files, converting them to html and putting that into a template html file. And generating some lists (index page, RSS, ...) and some metadata for SEO. So it took me a single Saturday to make a working static site generator and now I can do anything I want without looking up documentation or source code, since it's my own dumpster fire :-)