New comment by evangambit in "Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs"

evangambit — Thu, 12 Mar 2026 01:04:26 +0000

Even with the same embedding sizes and vocabularies, there’s nothing that forces the meaning of dimension 1 of model 1 to mean the same thing as dimension 1 of model 2 — there are lots of ways to permute the dimensions of a model without changing its output, so whatever dimension 1 means the first time you train a model is just as likely to end up as dimension 2 the second time you train is as it is to be consistent with the first model.

Nobody here or on Reddit has mentioned this, maybe bc it’s too obvious, but it’s clear to me that the residual connections are an absolutely necessary component to making this merging possible — that’s the only reason dimension 1 of a later layer is encouraged to mean something similar to dimension 1 of an earlier layer.

Hacker News: evangambit

New comment by evangambit in "Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs"