<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: samhoss93</title><link>https://news.ycombinator.com/user?id=samhoss93</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 10 Jun 2026 14:37:25 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=samhoss93" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by samhoss93 in "Show HN: Piqc – An open-source GPU waste scanner for LLM inference clusters"]]></title><description><![CDATA[
<p>piqc scans your Kubernetes cluster (Read-only) and identifies which models are running on the wrong GPU tier and what the cost attribution is.  It runs in a minute.  I'd like to hear the community's experiences/thoughts on our detection approach and its benefits.</p>
]]></description><pubDate>Fri, 05 Jun 2026 14:18:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=48412911</link><dc:creator>samhoss93</dc:creator><comments>https://news.ycombinator.com/item?id=48412911</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48412911</guid></item><item><title><![CDATA[Show HN: Piqc – An open-source GPU waste scanner for LLM inference clusters]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/paralleliq/piqc">https://github.com/paralleliq/piqc</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48412542">https://news.ycombinator.com/item?id=48412542</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Fri, 05 Jun 2026 13:48:48 +0000</pubDate><link>https://github.com/paralleliq/piqc</link><dc:creator>samhoss93</dc:creator><comments>https://news.ycombinator.com/item?id=48412542</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48412542</guid></item><item><title><![CDATA[New comment by samhoss93 in "Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA"]]></title><description><![CDATA[
<p>Great README.  Genuinely one of the clearest walkthrough of inference internals.  The KV cache section is worth lingering one as most of the OOM and throughput issues trace back to this and normally difficult to reason about.  sequence length and batch size fill the cache in a way that show up under real traffic.<p>look forward to going over the completed course.</p>
]]></description><pubDate>Mon, 01 Jun 2026 22:44:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48363579</link><dc:creator>samhoss93</dc:creator><comments>https://news.ycombinator.com/item?id=48363579</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48363579</guid></item></channel></rss>