<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: tatrions</title><link>https://news.ycombinator.com/user?id=tatrions</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 06 Apr 2026 02:41:51 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=tatrions" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by tatrions in "Embarrassingly simple self-distillation improves code generation"]]></title><description><![CDATA[
<p>Token-level entropy is a pretty clean proxy for detecting fork vs lock positions at inference time. Low entropy = lock (decode greedily), high entropy = fork (sample with more temperature). Speculative decoding already exploits something similar where the small draft model handles the predictable tokens and the big model kicks in at the uncertain ones. Combining that with this paper's fork/lock framing could get you adaptive temperature basically for free during inference.</p>
]]></description><pubDate>Sun, 05 Apr 2026 18:54:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=47652636</link><dc:creator>tatrions</dc:creator><comments>https://news.ycombinator.com/item?id=47652636</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47652636</guid></item></channel></rss>