<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: blagui</title><link>https://news.ycombinator.com/user?id=blagui</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 02 Jul 2026 00:11:09 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=blagui" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by blagui in "LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active"]]></title><description><![CDATA[
<p>Too big to be hosted and used locally unless you have some prod servers under you desk.<p>And those aiming to fit with Q2 or Q1. It's not even worth it to destroy the models to claim it's still alive after cutting all the limbs.</p>
]]></description><pubDate>Tue, 30 Jun 2026 13:18:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=48732385</link><dc:creator>blagui</dc:creator><comments>https://news.ycombinator.com/item?id=48732385</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48732385</guid></item><item><title><![CDATA[New comment by blagui in "Qwen 3.6 27B is the sweet spot for local development"]]></title><description><![CDATA[
<p>So the sweet spot for dev in 2026 is 64k context windows? Are we back in 2024?<p>As more context will degrade a lot the t/s. On top this is 1 slot.<p>If you use sub agents the kv cache will be invalidated with colliding request and make it even slower.<p>So the in real world 256k (the max qwen offer) and using 3-4 slots the numbers are very different.<p>This is the major issue with so many postes over local models not benchmarking real world use. Real context and not taking this in context.<p>If you use 1 slot the issue, you loose the ability of using sub agents when exploring and all end up in the main agent context overloading it, triggering compactation and oh boy with 64k context that compecation will be an endless loop.<p>What tasks you would really be able to do with 64k context 1 agent? For sure so quick edits but not complex planning where you need to ingest a lot files and end up loosing 80% of the ingested files to compactation.</p>
]]></description><pubDate>Tue, 30 Jun 2026 13:16:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48732352</link><dc:creator>blagui</dc:creator><comments>https://news.ycombinator.com/item?id=48732352</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48732352</guid></item><item><title><![CDATA[New comment by blagui in "Qwen 3.6 27B is the sweet spot for local development"]]></title><description><![CDATA[
<p>How you can do dev in 2026 using 64k context and without sub agents?<p>The benchmark seemed fine until I saw that.<p>If you use sub agents, they will overwrite the cache and each request will trigger full reprocessing. Have fun with that as it will crash the t/s metrics on each prefill on top of the max 64k including input + output is a major blocker.<p>If you push the context higher and add parallel slots the requirements will be far higher and the numbers less shiny.</p>
]]></description><pubDate>Mon, 29 Jun 2026 22:24:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=48726135</link><dc:creator>blagui</dc:creator><comments>https://news.ycombinator.com/item?id=48726135</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48726135</guid></item></channel></rss>