<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: evangambit</title><link>https://news.ycombinator.com/user?id=evangambit</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 23 Apr 2026 16:49:25 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=evangambit" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by evangambit in "Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs"]]></title><description><![CDATA[
<p>Even with the same embedding sizes and vocabularies, there’s nothing that forces the meaning of dimension 1 of model 1 to mean the same thing as dimension 1 of model 2 — there are lots of ways to permute the dimensions of a model without changing its output, so whatever dimension 1 means the first time you train a model is just as likely to end up as dimension 2 the second time you train is as it is to be consistent with the first model.<p>Nobody here or on Reddit has mentioned this, maybe bc it’s too obvious, but it’s clear to me that the residual connections are an absolutely necessary component to making this merging possible — that’s the only reason dimension 1 of a later layer is encouraged to mean something similar to dimension 1 of an earlier layer.</p>
]]></description><pubDate>Thu, 12 Mar 2026 01:04:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47344921</link><dc:creator>evangambit</dc:creator><comments>https://news.ycombinator.com/item?id=47344921</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47344921</guid></item></channel></rss>