<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: zackangelo</title><link>https://news.ycombinator.com/user?id=zackangelo</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 15 Jun 2026 20:10:11 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=zackangelo" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by zackangelo in "Kimi K2.7-Code: open-source coding model with better token efficiency"]]></title><description><![CDATA[
<p>I don't believe safetensors has a native int4 dtype, so they packed 4 int4s into a bf16 in this checkpoint.</p>
]]></description><pubDate>Fri, 12 Jun 2026 17:51:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=48507268</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=48507268</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48507268</guid></item><item><title><![CDATA[New comment by zackangelo in "The real cost of owning a home"]]></title><description><![CDATA[
<p>If you're in SF and weighing this decision, it's easy to get tilted in the buy direction because the rental stock is so horrific. Landlords have very little incentive to update properties or provide basic amenities that people take for granted in other major cities (good luck getting a washer/dryer).</p>
]]></description><pubDate>Tue, 26 May 2026 16:58:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=48282437</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=48282437</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48282437</guid></item><item><title><![CDATA[New comment by zackangelo in "Qwen3.7-Max: The Agent Frontier"]]></title><description><![CDATA[
<p>With the 3.5 release, the Plus model was just a rebrand of the open weight 397B. But I suspect that will change going forward. They haven’t released the weights for 3.6 but they did make it available through a few US providers.</p>
]]></description><pubDate>Wed, 20 May 2026 15:09:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=48209099</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=48209099</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48209099</guid></item><item><title><![CDATA[New comment by zackangelo in "I’ve joined Anthropic"]]></title><description><![CDATA[
<p>absolutely not, take Kimi K2.6 for a spin</p>
]]></description><pubDate>Tue, 19 May 2026 17:29:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=48196381</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=48196381</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48196381</guid></item><item><title><![CDATA[How do agents see your website?]]></title><description><![CDATA[
<p>Article URL: <a href="https://what-do-agents-see.runtype.app/">https://what-do-agents-see.runtype.app/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48123838">https://news.ycombinator.com/item?id=48123838</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 13 May 2026 16:11:19 +0000</pubDate><link>https://what-do-agents-see.runtype.app/</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=48123838</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48123838</guid></item><item><title><![CDATA[New comment by zackangelo in "Mistral Medium 3.5"]]></title><description><![CDATA[
<p>Isn't Kimi K2.6 natively INT4?</p>
]]></description><pubDate>Wed, 29 Apr 2026 17:56:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47951935</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=47951935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47951935</guid></item><item><title><![CDATA[New comment by zackangelo in "HashiCorp co-founder says GitHub 'no longer a place for serious work'"]]></title><description><![CDATA[
<p>I don’t think this is true across Blizzard. Overwatch is the best it’s ever been.</p>
]]></description><pubDate>Wed, 29 Apr 2026 12:54:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47947665</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=47947665</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47947665</guid></item><item><title><![CDATA[New comment by zackangelo in "Parallel agents in Zed"]]></title><description><![CDATA[
<p>I give them a try about twice a year. I write a lot of Rust which should be squarely in their wheelhouse.<p>This last time I was pleasantly surprised to find they mostly fixed their SSH remote editing support. But then it started truncating rustc inline error messages and I couldn’t figure out how to view the whole thing easily. When you’re just trying to get something done little bits like this can add up quickly. Punted back to Cursor for now.</p>
]]></description><pubDate>Wed, 22 Apr 2026 23:35:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=47870612</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=47870612</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47870612</guid></item><item><title><![CDATA[New comment by zackangelo in "Qwen3.6-35B-A3B: Agentic coding power, now open to all"]]></title><description><![CDATA[
<p>They are but the IDE needs to be integrated with them.<p>Qwen specifically calls out FIM (“fill in the middle”) support on the model card and you can see it getting confused and posting the control tokens in the example here.</p>
]]></description><pubDate>Thu, 16 Apr 2026 15:00:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=47794134</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=47794134</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47794134</guid></item><item><title><![CDATA[New comment by zackangelo in "Qwen3.6-35B-A3B: Agentic coding power, now open to all"]]></title><description><![CDATA[
<p>17b per token. So when you’re generating a single stream of text (“decoding”) 17b parameters are active.<p>If you’re decoding multiple streams, it will be 17b per stream (some tokens will use the same expert, so there is some overlap).<p>When the model is ingesting the prompt (“prefilling”) it’s looking at many tokens at once, so the number of active parameters will be larger.</p>
]]></description><pubDate>Thu, 16 Apr 2026 14:57:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47794079</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=47794079</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47794079</guid></item><item><title><![CDATA[New comment by zackangelo in "GPU memory snapshots: sub-second startup (2025)"]]></title><description><![CDATA[
<p>This uses Nvidia’s CUDA snapshot API under the hood, but you have to pair it with a host side snapshot as well. Modal uses gVisor for this, which is notoriously high overhead.<p>Does anyone know of a more efficient alternative if you’re running a trusted container?</p>
]]></description><pubDate>Sat, 10 Jan 2026 23:56:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=46571234</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=46571234</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46571234</guid></item><item><title><![CDATA[New comment by zackangelo in "macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt"]]></title><description><![CDATA[
<p>You’re right I misunderstood.<p>I’m not sure if it would be of much utility because this would presumably be for tensor parallel workloads. In that case you want the ranks in your cluster to be uniform or else everything will be forced to run at the speed of the slowest rank.<p>You could run pipeline parallel but not sure it’d be that much better than what we already have.</p>
]]></description><pubDate>Fri, 12 Dec 2025 23:26:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=46250310</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=46250310</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46250310</guid></item><item><title><![CDATA[New comment by zackangelo in "macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt"]]></title><description><![CDATA[
<p>Sparks are built for this and actually have Connect-X 7 NICs built in! You just need to get the SFPs for them. This means you can natively cluster them at 200Gbps.</p>
]]></description><pubDate>Fri, 12 Dec 2025 23:05:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=46250135</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=46250135</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46250135</guid></item><item><title><![CDATA[New comment by zackangelo in "macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt"]]></title><description><![CDATA[
<p>No you use tensor parallelism in both cases.<p>The way it typically works in an attention block is: smaller portions of the Q, K and V linear layers are assigned to each node and are processed independently. Attention, rope norm etc is run on the node-specific output of that. Then, when the output linear layer is applied an "all reduce" is computed which combines the output of all the nodes.<p>EDIT: just realized it wasn't clear -- this means that each node ends up holding a portion of the KV cache specific to its KV tensor shards. This can change based on the specific style of attention (e.g., in GQA where there are fewer KV heads than ranks you end up having to do some replication etc)</p>
]]></description><pubDate>Fri, 12 Dec 2025 23:00:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=46250099</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=46250099</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46250099</guid></item><item><title><![CDATA[New comment by zackangelo in "Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model"]]></title><description><![CDATA[
<p>What 1T parameter base model have you seen from any of those labs?</p>
]]></description><pubDate>Fri, 07 Nov 2025 03:41:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=45843360</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=45843360</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45843360</guid></item><item><title><![CDATA[New comment by zackangelo in "NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference"]]></title><description><![CDATA[
<p>Wouldn't you be able to test nccl if you had 2 of these?</p>
]]></description><pubDate>Wed, 15 Oct 2025 03:39:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=45587856</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=45587856</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45587856</guid></item><item><title><![CDATA[New comment by zackangelo in "Launch HN: LlamaFarm (YC W22) – Open-source framework for distributed AI"]]></title><description><![CDATA[
<p>Just a bit of feedback:<p>> Instead of one brittle giant, we orchestrate a Mixture of Experts…<p>“mixture of experts” is a specific term of art that describes an architectural detail of a type of transformer model. It’s definitely not using smaller specialized models for individual tasks. Experts in an MoE model are actually routed to on a per token basis, not on a per task or per generation basis.<p>I know it’s tempting to co-opt this term because it would fit nicely for what you’re trying to do but it just adds confusion.</p>
]]></description><pubDate>Wed, 08 Oct 2025 16:48:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=45518142</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=45518142</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45518142</guid></item><item><title><![CDATA[New comment by zackangelo in "Apps SDK"]]></title><description><![CDATA[
<p>Because it depends on how much better “best” is. If it’s only incrementally better than open source models that have other advantages, why would you bother?<p>OpenAI’s moat will only come from the products they built on top. Theoretically their products will be better because they’ll be more vertically integrated with the underlying models. It’s not unlike Apple’s playbook with regard to hardwares and software integration.</p>
]]></description><pubDate>Mon, 06 Oct 2025 20:36:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=45496025</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=45496025</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45496025</guid></item><item><title><![CDATA[New comment by zackangelo in "From multi-head to latent attention: The evolution of attention mechanisms"]]></title><description><![CDATA[
<p>Not quite a frontier model but definitely built by a frontier lab: Grok 2 was recently open sourced and I believe it uses a fairly standard MHA architecture with MoE.</p>
]]></description><pubDate>Sat, 30 Aug 2025 17:23:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=45076391</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=45076391</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45076391</guid></item><item><title><![CDATA[New comment by zackangelo in "Mosh Mobile Shell"]]></title><description><![CDATA[
<p>I feel a bit silly for not noticing this before. Over the last year or so I've often wondered when ssh added protocol-level support for session resume. I'd open my laptop on a new network and everything would be ready to go. But of course, it's nothing to do with ssh, it's just that I started using tailscale.</p>
]]></description><pubDate>Thu, 28 Aug 2025 17:08:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=45054536</link><dc:creator>zackangelo</dc:creator><comments>https://news.ycombinator.com/item?id=45054536</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45054536</guid></item></channel></rss>