<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: za_mike157</title><link>https://news.ycombinator.com/user?id=za_mike157</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 02 Jul 2026 04:42:58 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=za_mike157" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by za_mike157 in "Reduce GVisor Cold Starts with GPU Snapshotting"]]></title><description><![CDATA[
<p>Interesting! I didn't see they released this. Do you know what their benchmarks are? I know for cloud run they are pretty slow</p>
]]></description><pubDate>Wed, 01 Jul 2026 19:27:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48751930</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=48751930</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48751930</guid></item><item><title><![CDATA[New comment by za_mike157 in "Reduce GVisor Cold Starts with GPU Snapshotting"]]></title><description><![CDATA[
<p>Us and the team from Modal have been upstreaming things to the GVisor repo (<a href="https://github.com/google/gvisor/pulls" rel="nofollow">https://github.com/google/gvisor/pulls</a>) in order to make it compatible with cuda-checkpoint and other parts of our system. While we are both contributing fixes and performance improvements we are unfortunately leaving some secret sauce on the side but hopefully it should get most folks to a successful implementation as is</p>
]]></description><pubDate>Wed, 01 Jul 2026 18:33:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=48751314</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=48751314</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48751314</guid></item><item><title><![CDATA[New comment by za_mike157 in "Reduce GVisor Cold Starts with GPU Snapshotting"]]></title><description><![CDATA[
<p>haha you are right that the title is a bit strange - should just be "Reduce GPU cold starts with snapshotting"<p>I can't read good ;)</p>
]]></description><pubDate>Wed, 01 Jul 2026 18:30:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=48751241</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=48751241</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48751241</guid></item><item><title><![CDATA[New comment by za_mike157 in "Reduce GVisor Cold Starts with GPU Snapshotting"]]></title><description><![CDATA[
<p>No we don't use it. CRIU is used for normal checkpoint/restore of Linux processes. Since we run GVisor for container isolation we use their checkpoint/restore support for the sandboxed process state.<p>Both approaches still need NVIDIA’s cuda-checkpoint for the GPU side, because CUDA/GPU memory and driver state are not something a normal process checkpointing tool can handle on its own.</p>
]]></description><pubDate>Wed, 01 Jul 2026 17:25:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48750315</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=48750315</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48750315</guid></item><item><title><![CDATA[New comment by za_mike157 in "Reduce GVisor Cold Starts with GPU Snapshotting"]]></title><description><![CDATA[
<p>There are a lot of similarities.<p>They run their snapshot agent as a Kubernetes DaemonSet, whereas our implementation runs as part of the Cerebrium container runtime path. Under the hood, both approaches rely on cuda-checkpoint, since cuda-checkpoint is currently the main primitive NVIDIA exposes for interacting with GPU memory during checkpoint/restore.<p>One difference is how KV cache handling is exposed. NVIDIA’s approach appears to automatically handle KV cache allocation/deallocation, whereas today we expose that choice to users (vLLM and SGLang expose primitives to to his). In some cases, users may want to discard the KV cache to reduce checkpoint size and restore time; in others, preserving it may be useful.<p>Their DaemonSet approach is also nice because it can be more portable across Kubernetes environments and clouds. Our approach is more deeply integrated into the node/runtime layer, which gives us tighter control over the serverless startup path, but also means it depends on custom node VM images, which not every provider supports equally.<p>The optimizations they mention around parallel memfd restore and Linux native AIO for anonymous memory could also be applied to our architecture if we find them stable and beneficial. That said, our current results are already pretty close. For example, they report restoring Qwen3-8B in 4.7s with those changes, while we currently restore it in 6.49s.<p>The biggest thing we are excited for is multi-GPU restore, which is not supported yet. That would unlock a much broader set of workloads.</p>
]]></description><pubDate>Wed, 01 Jul 2026 17:19:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=48750226</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=48750226</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48750226</guid></item><item><title><![CDATA[New comment by za_mike157 in "Reduce GVisor Cold Starts with GPU Snapshotting"]]></title><description><![CDATA[
<p>Hey! Yes you are correct! We have both been upstreaming changes to the main GVisor repo. However, in order to work within our own infrastructure we had to make various changes that we explain throughout the article (Open TCP connections, multiprocessing, unix sockets etc).<p>Also in our benchmarks we seem to perform better than Modal by ~20% in 4/6 workloads we tested and have a lower spread of results meaning you get more consistent results. However the same fundamentals still apply -> how can you move storage into memory as quickly as possible</p>
]]></description><pubDate>Wed, 01 Jul 2026 16:50:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48749758</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=48749758</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48749758</guid></item><item><title><![CDATA[Why Kubernetes Serving Breaks Down for Real-Time AI]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.cerebrium.ai/blog/why-kubernetes-serving-breaks-down-for-realtime-ai">https://www.cerebrium.ai/blog/why-kubernetes-serving-breaks-down-for-realtime-ai</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47504872">https://news.ycombinator.com/item?id=47504872</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 24 Mar 2026 16:11:14 +0000</pubDate><link>https://www.cerebrium.ai/blog/why-kubernetes-serving-breaks-down-for-realtime-ai</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=47504872</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47504872</guid></item><item><title><![CDATA[New comment by za_mike157 in "The 1979 Design Choice Breaking AI Workloads"]]></title><description><![CDATA[
<p>Glad you liked it!</p>
]]></description><pubDate>Mon, 09 Mar 2026 19:09:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47313887</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=47313887</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47313887</guid></item><item><title><![CDATA[New comment by za_mike157 in "The 1979 Design Choice Breaking AI Workloads"]]></title><description><![CDATA[
<p>You are correct! From our tests, storing model weights in the image actually isn't a preferred approach for model weights larger than ~1GB. We run a distributed, multi-layer cache system to combat this and we can load roughly 6-7GB of files in p99 of <2.5s</p>
]]></description><pubDate>Mon, 09 Mar 2026 19:09:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47313883</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=47313883</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47313883</guid></item><item><title><![CDATA[New comment by za_mike157 in "The 1979 Design Choice Breaking AI Workloads"]]></title><description><![CDATA[
<p>A lot of AI workloads require GPUs which are expensive so customers would waste money running idle machines 24/7 with low utilisation which kills gross margins. By loading containers quickly means, means we can scale up quickly as requests come in and you only need to pay for usage.<p>This is successful for CPU workloads (AWS Lambda) but AI models and images are 50x the size</p>
]]></description><pubDate>Mon, 09 Mar 2026 19:06:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47313839</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=47313839</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47313839</guid></item><item><title><![CDATA[The 1979 Design Choice Breaking AI Workloads]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.cerebrium.ai/blog/rethinking-container-image-distribution-to-eliminate-cold-starts">https://www.cerebrium.ai/blog/rethinking-container-image-distribution-to-eliminate-cold-starts</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47311745">https://news.ycombinator.com/item?id=47311745</a></p>
<p>Points: 25</p>
<p># Comments: 20</p>
]]></description><pubDate>Mon, 09 Mar 2026 16:59:05 +0000</pubDate><link>https://www.cerebrium.ai/blog/rethinking-container-image-distribution-to-eliminate-cold-starts</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=47311745</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47311745</guid></item><item><title><![CDATA[AI Companies need to partner with Serverless compute platforms vs. K8s]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.cerebrium.ai/blog/why-serverless-compute-partners-are-now-more-important-than-ever">https://www.cerebrium.ai/blog/why-serverless-compute-partners-are-now-more-important-than-ever</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47222679">https://news.ycombinator.com/item?id=47222679</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 02 Mar 2026 19:16:28 +0000</pubDate><link>https://www.cerebrium.ai/blog/why-serverless-compute-partners-are-now-more-important-than-ever</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=47222679</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47222679</guid></item><item><title><![CDATA[New comment by za_mike157 in "How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inference"]]></title><description><![CDATA[
<p>Hey! Founder of Cerebrium here.<p>- Runpod is one of the cheapest but it comes at the price of reliability (critical for businesses)
- We have more performant cold start performance with something special launching soon here
- Iterating on your application using CPUs/GPUs in the cloud takes just 2–10 seconds, compared to several minutes with Runpod due to Docker push/pull.
- Allow you to deploy in multiple regions globally for lower latency and data residency compliance
- We provide a lot of software abstractions (fire and forget jobs, websockets, batching, etc) where as Runpod just deploys your docker image.
- SOC 2 and GDPR compliant<p>With that all being said - we are working on optimisations to bring down pricing</p>
]]></description><pubDate>Tue, 22 Jul 2025 14:19:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=44647227</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=44647227</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44647227</guid></item><item><title><![CDATA[New comment by za_mike157 in "Launch HN: Cerebrium (YC W22) – Serverless Infrastructure Platform for ML/AI"]]></title><description><![CDATA[
<p>I haven't used SkyPilot so I am unfamiliar with the experience and performance.<p>However, some of the situations you would like to use Cerebrium over Skypilot are:
- You don't want to manage you own hardware
- Reduced costs: With serverless Runtime and low cold starts (unclear if SkyPiolet offers this and what the peformance is like if they do)
- Rapid iteration: Unclear of the deployment process on SkyPilot and how long projects take to go live
- Observability: Looks like you would just have k8s metrics at your disposal</p>
]]></description><pubDate>Thu, 19 Sep 2024 11:56:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=41590861</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=41590861</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41590861</guid></item><item><title><![CDATA[New comment by za_mike157 in "Launch HN: Cerebrium (YC W22) – Serverless Infrastructure Platform for ML/AI"]]></title><description><![CDATA[
<p>I think we used this UI kit: <a href="https://minimals.cc/" rel="nofollow">https://minimals.cc/</a></p>
]]></description><pubDate>Thu, 19 Sep 2024 11:11:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=41590599</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=41590599</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41590599</guid></item><item><title><![CDATA[New comment by za_mike157 in "Launch HN: Cerebrium (YC W22) – Serverless Infrastructure Platform for ML/AI"]]></title><description><![CDATA[
<p>I guess then the next question would be how quickly can they start executing your container from cold start when a workload comes in? Typically we see companies on around 30-60s</p>
]]></description><pubDate>Thu, 19 Sep 2024 11:10:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=41590598</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=41590598</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41590598</guid></item><item><title><![CDATA[New comment by za_mike157 in "Launch HN: Cerebrium (YC W22) – Serverless Infrastructure Platform for ML/AI"]]></title><description><![CDATA[
<p>Do you mean why the individual file names aren't quoted?<p>You can see an example config file at the bottom of that link you attached - agreed we should probably make it more obvious</p>
]]></description><pubDate>Thu, 19 Sep 2024 00:51:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=41587313</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=41587313</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41587313</guid></item><item><title><![CDATA[New comment by za_mike157 in "Launch HN: Cerebrium (YC W22) – Serverless Infrastructure Platform for ML/AI"]]></title><description><![CDATA[
<p>Thanks for confirming! Our cold start, excluding model load is 2-4 seconds typically for HF models.<p>The only time it gets much longer when companies have done a lot with very specific CUDA implementations</p>
]]></description><pubDate>Wed, 18 Sep 2024 19:11:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=41584189</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=41584189</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41584189</guid></item><item><title><![CDATA[New comment by za_mike157 in "Launch HN: Cerebrium (YC W22) – Serverless Infrastructure Platform for ML/AI"]]></title><description><![CDATA[
<p>Thanks Tom! Excited to to support you and the team as you grow</p>
]]></description><pubDate>Wed, 18 Sep 2024 18:27:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=41583661</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=41583661</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41583661</guid></item><item><title><![CDATA[New comment by za_mike157 in "Launch HN: Cerebrium (YC W22) – Serverless Infrastructure Platform for ML/AI"]]></title><description><![CDATA[
<p>Ah I see they recently cut their pricing by 40% so you are correct - sorry about that. It seems we are more expensive compared to their new pricing</p>
]]></description><pubDate>Wed, 18 Sep 2024 18:03:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=41583358</link><dc:creator>za_mike157</dc:creator><comments>https://news.ycombinator.com/item?id=41583358</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41583358</guid></item></channel></rss>