<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mbowcut2</title><link>https://news.ycombinator.com/user?id=mbowcut2</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 16 Apr 2026 15:44:34 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mbowcut2" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mbowcut2 in "Anthropic Cowork feature creates 10GB VM bundle on macOS without warning"]]></title><description><![CDATA[
<p>Gotta hit that docker system prune -a</p>
]]></description><pubDate>Mon, 02 Mar 2026 16:23:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47220074</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=47220074</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47220074</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Robert Duvall has died"]]></title><description><![CDATA[
<p>Loved him in Secondhand Lions.</p>
]]></description><pubDate>Mon, 16 Feb 2026 20:44:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47040087</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=47040087</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47040087</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Microsoft forced me to switch to Linux"]]></title><description><![CDATA[
<p>If you thought we were getting bad bugs before, just wait until the 90% agent-coded PRs start landing. We're gonna have multiple crowdstrike-level blowups.</p>
]]></description><pubDate>Wed, 28 Jan 2026 19:16:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=46800201</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=46800201</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46800201</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Proof of Corn"]]></title><description><![CDATA[
<p>It's an interesting concept, but I'm skeptical about how feasible this is. How much design/legwork/intervention will Seth actually contribute during the entire process? I'm thinking "growing corn" might be a little hard for a proof of concept, specifically because the time horizon is quite long. Something a little more short term like: contracting a landscaping job. The model comes up with design ideas, contacts landscapers, gets bids, accepts a bid. Seth could tell the model that he's it's agent, available to sign for things, walk people through the property, etc, but will make no decisions, and is only reachable by email or text.</p>
]]></description><pubDate>Fri, 23 Jan 2026 21:09:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=46737963</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=46737963</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46737963</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Claude Cowork exfiltrates files"]]></title><description><![CDATA[
<p>Wow, I didn't know about the "skills" feature, but with that as context isn't this attack strategy obvious? Running an unverified skill in Cowork is akin to running unverified code on your machine. The next super-genius attack vector will be something like: Claude Cowork deletes sytem32 when you give it root access and run the skill "brick_my_machine" /s.</p>
]]></description><pubDate>Thu, 15 Jan 2026 02:27:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=46627258</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=46627258</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46627258</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Mistral 3 family of models released"]]></title><description><![CDATA[
<p>It makes me wonder about the gaps in evaluating LLMs by benchmarks. There almost certainly is overfitting happening which could degrade other use cases. "In practice" evaluation is what inspired the Chatbot Arena right? But then people realized that Chatbot arena over-prioritizes formatting, and maybe sycophancy(?). Makes you wonder what the best evaluation would be. We probably need lots more task-specific models. That's seemed to be fruitful for improved coding.</p>
]]></description><pubDate>Tue, 02 Dec 2025 17:46:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=46124029</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=46124029</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46124029</guid></item><item><title><![CDATA[New comment by mbowcut2 in "A small number of samples can poison LLMs of any size"]]></title><description><![CDATA[
<p>Seems like the less sexy headline is just something about the sample size needed for LLM fact encoding That's honestly a more interesting angle to me: How many instances of data X needs to be in the training data for the LLM to properly encode it? Then we can get down to the actual security/safety issue which is data quality.</p>
]]></description><pubDate>Thu, 09 Oct 2025 18:39:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=45531443</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=45531443</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45531443</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Top model scores may be skewed by Git history leaks in SWE-bench"]]></title><description><![CDATA[
<p>I'm not surprised. People really thought the models just kept getting better and better?</p>
]]></description><pubDate>Thu, 11 Sep 2025 19:23:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=45215205</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=45215205</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45215205</guid></item><item><title><![CDATA[Bezier Curve]]></title><description><![CDATA[
<p>Article URL: <a href="https://javascript.info/bezier-curve">https://javascript.info/bezier-curve</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44868289">https://news.ycombinator.com/item?id=44868289</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 11 Aug 2025 19:17:10 +0000</pubDate><link>https://javascript.info/bezier-curve</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44868289</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44868289</guid></item><item><title><![CDATA[Magnus Carlsen Commentates Grok vs. OpenAI Finale [video]]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.youtube.com/watch?v=vtHfJ6iYyEY">https://www.youtube.com/watch?v=vtHfJ6iYyEY</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44839594">https://news.ycombinator.com/item?id=44839594</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Fri, 08 Aug 2025 17:35:39 +0000</pubDate><link>https://www.youtube.com/watch?v=vtHfJ6iYyEY</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44839594</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44839594</guid></item><item><title><![CDATA[New comment by mbowcut2 in "GPT-5"]]></title><description><![CDATA[
<p>it looks like the 2nd and 3rd bar never got updated from the dummy data placeholders lol.</p>
]]></description><pubDate>Thu, 07 Aug 2025 17:52:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=44827917</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44827917</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44827917</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Genie 3: A new frontier for world models"]]></title><description><![CDATA[
<p>It's not a new problem (for individuals), though perhaps at an unprecedented scale (so, maybe a new problem for civilization). I'm sure there were black smiths that felt they had lost their meaning when they were replaced by industrial manufacturing.</p>
]]></description><pubDate>Tue, 05 Aug 2025 16:26:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=44800161</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44800161</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44800161</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Show HN: I built an AI that turns any book into a text adventure game"]]></title><description><![CDATA[
<p>I've had similar experiences with vanilla ChatGPT as a DM but I bet with clever prompt engineering and context window management you could solve or at least dramatically improve the experience. For example, you could have the model execute a planning step before your session in which it generates a plot outline, character list, story tree, etc. which could then be used for reference during the game session.<p>One problem that would probably still linger is model agreeableness, i.e. despite preparation, models have a tendency to say yes to whatever you ask for, and everybody knows a good DM needs to know when to say no.</p>
]]></description><pubDate>Tue, 29 Jul 2025 18:55:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=44727010</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44727010</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44727010</guid></item><item><title><![CDATA[New comment by mbowcut2 in "LLM Embeddings Explained: A Visual and Intuitive Guide"]]></title><description><![CDATA[
<p>You can, and there has been some interesting work done with it. The technique is called LogitLens, and basically you pass intermediate embeddings through the LMHead to get logits corresponding to tokens. In this paper they use it to investigate whether LLMs have a language bias, i.e. does GPT "think" in English? <a href="https://arxiv.org/pdf/2408.10811" rel="nofollow">https://arxiv.org/pdf/2408.10811</a><p>One problem with this technique is that the model wasn't trained with  intermediate layers being mapped to logits in the first place, so it's not clear why the LMHead should be able to map them to anything sensible. But alas, like everything in DL research, they threw science at the wall and a bit stuck.</p>
]]></description><pubDate>Mon, 28 Jul 2025 17:39:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=44713256</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44713256</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44713256</guid></item><item><title><![CDATA[New comment by mbowcut2 in "LLM Embeddings Explained: A Visual and Intuitive Guide"]]></title><description><![CDATA[
<p>The problem with embeddings is that they're basically inscrutable to anything but the model itself. It's true that they must encode the semantic meaning of the input sequence, but the learning process compresses it to the point that only the model's learned decoder head knows what to do with it. Anthropic's developed interpretable internal features for Sonnet 3 [1], but from what I understand that requires somewhat expensive parallel training of a network whose sole purpose is attempt to disentangle LLM hidden layer activations.<p>[1] <a href="https://transformer-circuits.pub/2024/scaling-monosemanticity/" rel="nofollow">https://transformer-circuits.pub/2024/scaling-monosemanticit...</a></p>
]]></description><pubDate>Mon, 28 Jul 2025 15:03:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=44711582</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44711582</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44711582</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Quantitative AI progress needs accurate and transparent evaluation"]]></title><description><![CDATA[
<p>LLMs are better at LaTeX than humans. ChatGPT often writes LaTeX responses.</p>
]]></description><pubDate>Fri, 25 Jul 2025 14:02:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=44683244</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44683244</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44683244</guid></item><item><title><![CDATA[New comment by mbowcut2 in "The Tradeoffs of SSMs and Transformers"]]></title><description><![CDATA[
<p>I think I agree with you. My only rebuttal would be it's this kind of thinking that's kept any leading players form trying other architectures in the first place. As far as I know, SOTA for SSM's just doesn't suggest significant enough potential upsides warrant significant R&D. Not compared to the tried and true established LLM methods. The decision might be something like: "Pay X to train a competitive LLM" vs "Pay 2X to MAYBE train a competitive SSM".</p>
]]></description><pubDate>Wed, 09 Jul 2025 00:19:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=44505203</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44505203</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44505203</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Honda conducts successful launch and landing of experimental reusable rocket"]]></title><description><![CDATA[
<p>I read this as "pirate space industry" and got real excited.</p>
]]></description><pubDate>Tue, 17 Jun 2025 19:24:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=44302881</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44302881</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44302881</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Honda conducts successful launch and landing of experimental reusable rocket"]]></title><description><![CDATA[
<p>It's interesting how I couldn't tell whether the rocket was 1m tall or 10m tall in this video. Turns out it's actually 6m tall per the link.</p>
]]></description><pubDate>Tue, 17 Jun 2025 19:21:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=44302853</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44302853</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44302853</guid></item><item><title><![CDATA[New comment by mbowcut2 in "Apple Announces Foundation Models and Containerization frameworks, etc."]]></title><description><![CDATA[
<p>Nah, I think they made it model agnostic, which is kinda smart.</p>
]]></description><pubDate>Mon, 09 Jun 2025 17:56:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=44227159</link><dc:creator>mbowcut2</dc:creator><comments>https://news.ycombinator.com/item?id=44227159</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44227159</guid></item></channel></rss>