<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: tgw43279w</title><link>https://news.ycombinator.com/user?id=tgw43279w</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 25 Apr 2026 12:47:04 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=tgw43279w" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by tgw43279w in "Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs"]]></title><description><![CDATA[
<p>Very cool, thanks for sharing! Recovering 96% using just two blocks on IMN-1k, wow!</p>
]]></description><pubDate>Tue, 10 Mar 2026 15:24:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47324540</link><dc:creator>tgw43279w</dc:creator><comments>https://news.ycombinator.com/item?id=47324540</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47324540</guid></item><item><title><![CDATA[New comment by tgw43279w in "Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs"]]></title><description><![CDATA[
<p>That was a fun read! The base64 decoding and encoding is quite interesting. A parallel: these models are surprisingly robust to heavy word mangling, back in 2023 people used this trick to jailbreak the models very often, but what was more surprising is that they even understand it. I always thought of it this way there must be some circuitry in the model that maps these almost unrecognizable words/sentences into their rectified versions. But what your base64 also shows is the fact thy can also encode them back as well! (However models are known to not be able to produce mangled output that looks convincingly random. I think the base64 transformation is more mechanical in this regard and hence it‘s easier to do the reverse for them.)
So your layer circuit hypothesis aligns pretty well with my mental model of how these models work based on the interpretability work I am familiar with! I really also like the way you used the heatmaps as a tool to derive layer insights, very intuitive! But it’s really surprising that you can simply duplicate layers and achieve better results that generalize!
This is some research grade effort! I’m confident you could publish this in NeurIPS  or ICML if you put it into a paper! I‘m quite impressed! Great work!</p>
]]></description><pubDate>Tue, 10 Mar 2026 14:12:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47323520</link><dc:creator>tgw43279w</dc:creator><comments>https://news.ycombinator.com/item?id=47323520</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47323520</guid></item><item><title><![CDATA[New comment by tgw43279w in "Show HN: Semantic Grep – A Word2Vec-powered search tool"]]></title><description><![CDATA[
<p>I really like how simple the implementation is!</p>
]]></description><pubDate>Sun, 28 Jul 2024 11:24:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=41092515</link><dc:creator>tgw43279w</dc:creator><comments>https://news.ycombinator.com/item?id=41092515</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41092515</guid></item><item><title><![CDATA[New comment by tgw43279w in "Linear Book Scanner – Open-source automatic book scanner (2014)"]]></title><description><![CDATA[
<p>Regarding your point about a successor to LaTeX: <a href="https://typst.app/" rel="nofollow noreferrer">https://typst.app/</a> is turning out to be great.</p>
]]></description><pubDate>Sun, 17 Sep 2023 16:41:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=37546953</link><dc:creator>tgw43279w</dc:creator><comments>https://news.ycombinator.com/item?id=37546953</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37546953</guid></item></channel></rss>