<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: jono_irwin</title><link>https://news.ycombinator.com/user?id=jono_irwin</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 02 Jul 2026 03:13:00 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=jono_irwin" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[Reduce GVisor Cold Starts with GPU Snapshotting]]></title><description><![CDATA[
<p>Article URL: <a href="https://cerebrium.ai/blog/reducing-gpu-cold-starts-with-memory-snapshots-restoring-cuda-workloads-in-second">https://cerebrium.ai/blog/reducing-gpu-cold-starts-with-memory-snapshots-restoring-cuda-workloads-in-second</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48749313">https://news.ycombinator.com/item?id=48749313</a></p>
<p>Points: 48</p>
<p># Comments: 15</p>
]]></description><pubDate>Wed, 01 Jul 2026 16:19:47 +0000</pubDate><link>https://cerebrium.ai/blog/reducing-gpu-cold-starts-with-memory-snapshots-restoring-cuda-workloads-in-second</link><dc:creator>jono_irwin</dc:creator><comments>https://news.ycombinator.com/item?id=48749313</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48749313</guid></item><item><title><![CDATA[New comment by jono_irwin in "The 1979 Design Choice Breaking AI Workloads"]]></title><description><![CDATA[
<p>Yeah that’s fair. For weights specifically there often isn’t a huge dedupe win across versions since retraining tends to change most of them. That said, we generally don’t advocate including model weights in container images anyway. The main benefit for us is avoiding the need to pull the full image up front and only fetching the data actually touched during startup. On the latency side, reads happen over a local network with caching and prefetching, so the impact on request latency is typically minimal.</p>
]]></description><pubDate>Mon, 09 Mar 2026 19:45:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47314410</link><dc:creator>jono_irwin</dc:creator><comments>https://news.ycombinator.com/item?id=47314410</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47314410</guid></item><item><title><![CDATA[New comment by jono_irwin in "The 1979 Design Choice Breaking AI Workloads"]]></title><description><![CDATA[
<p>That approach works really well when you have a stable shared base image.<p>Where it starts to get harder is when you have multiple base stacks (different CUDA versions, frameworks, etc.) or when you need to update them frequently. You end up with lots of slightly different multi-GB bases.<p>Chunked images keep the benefit you mentioned (we still cache heavily on the nodes) but the caching happens at a finer granularity. That makes it much more tolerant to small differences between images and to frequent updates, since unchanged chunks can still be reused.</p>
]]></description><pubDate>Mon, 09 Mar 2026 19:25:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47314133</link><dc:creator>jono_irwin</dc:creator><comments>https://news.ycombinator.com/item?id=47314133</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47314133</guid></item><item><title><![CDATA[New comment by jono_irwin in "The 1979 Design Choice Breaking AI Workloads"]]></title><description><![CDATA[
<p>Good point, network dependency is a valid concern.<p>In practice these systems typically fetch data over a local, highly available network and aggressively cache anything that gets read. If that network path becomes unavailable, it usually indicates a much larger infrastructure issue since many other parts of the system rely on the same storage or registry endpoints.<p>So while it does introduce a different failure mode, in most production environments it ends up being a low practical risk compared to the startup latency improvements.<p>For us and our customers, the trade off is worth it.</p>
]]></description><pubDate>Mon, 09 Mar 2026 19:18:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47314031</link><dc:creator>jono_irwin</dc:creator><comments>https://news.ycombinator.com/item?id=47314031</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47314031</guid></item><item><title><![CDATA[New comment by jono_irwin in "The 1979 Design Choice Breaking AI Workloads"]]></title><description><![CDATA[
<p>hey cosmotic, we're not really advocating for storing model weights in the container image.<p>even the smaller nvidia images (like nvidia/cuda:13.1.1-cudnn-runtime-ubuntu24.04) are about 2Gb before adding any python deps and that is a problem.<p>if you split the image into chunks and pull on-demand, your container will start much faster.</p>
]]></description><pubDate>Mon, 09 Mar 2026 19:07:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47313857</link><dc:creator>jono_irwin</dc:creator><comments>https://news.ycombinator.com/item?id=47313857</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47313857</guid></item><item><title><![CDATA[New comment by jono_irwin in "Launch HN: Cerebrium (YC W22) – Serverless Infrastructure Platform for ML/AI"]]></title><description><![CDATA[
<p>Thanks for the feedback! I like the sound of all of those:<p>- clearer messaging
- more tutorials
- one-click deploys
- clear & upfront costing<p>We have plans to add other runtimes (like Typescript) in the future but Python is our focus for now.</p>
]]></description><pubDate>Thu, 19 Sep 2024 03:39:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=41588263</link><dc:creator>jono_irwin</dc:creator><comments>https://news.ycombinator.com/item?id=41588263</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41588263</guid></item><item><title><![CDATA[New comment by jono_irwin in "Launch HN: Cerebrium (YC W22) – Serverless Infrastructure Platform for ML/AI"]]></title><description><![CDATA[
<p>There are definitely some parallels between Cerebrium and paperspace, but I don't think they are a direct competitor. The biggest difference being that paperspace doesn't have a serverless offering afaik.<p>Cerebrium abstracts some functionality - like streaming and batching endpoints. I think you would need to build that yourself on paperspace.</p>
]]></description><pubDate>Wed, 18 Sep 2024 21:23:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=41585717</link><dc:creator>jono_irwin</dc:creator><comments>https://news.ycombinator.com/item?id=41585717</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41585717</guid></item></channel></rss>