<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ow5</title><link>https://news.ycombinator.com/user?id=ow5</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 05 May 2026 08:35:23 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ow5" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ow5 in "Lossless LLM compression for efficient GPU inference via dynamic-length float"]]></title><description><![CDATA[
<p>Hi! one of the contributors to the paper — we have kernels not released yet that can shave down decoding latency by >20%.<p>Also when we ran experiments for streaming with the current kernels, we were median ~1.3x slower at inference</p>
]]></description><pubDate>Fri, 25 Apr 2025 20:39:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=43798291</link><dc:creator>ow5</dc:creator><comments>https://news.ycombinator.com/item?id=43798291</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43798291</guid></item></channel></rss>