Hacker News: mbowcut2

New comment by mbowcut2 in "Anthropic Cowork feature creates 10GB VM bundle on macOS without warning"

mbowcut2 — Mon, 02 Mar 2026 16:23:59 +0000

Gotta hit that docker system prune -a

New comment by mbowcut2 in "Robert Duvall has died"

mbowcut2 — Mon, 16 Feb 2026 20:44:22 +0000

Loved him in Secondhand Lions.

New comment by mbowcut2 in "Microsoft forced me to switch to Linux"

mbowcut2 — Wed, 28 Jan 2026 19:16:42 +0000

If you thought we were getting bad bugs before, just wait until the 90% agent-coded PRs start landing. We're gonna have multiple crowdstrike-level blowups.

New comment by mbowcut2 in "Proof of Corn"

mbowcut2 — Fri, 23 Jan 2026 21:09:25 +0000

It's an interesting concept, but I'm skeptical about how feasible this is. How much design/legwork/intervention will Seth actually contribute during the entire process? I'm thinking "growing corn" might be a little hard for a proof of concept, specifically because the time horizon is quite long. Something a little more short term like: contracting a landscaping job. The model comes up with design ideas, contacts landscapers, gets bids, accepts a bid. Seth could tell the model that he's it's agent, available to sign for things, walk people through the property, etc, but will make no decisions, and is only reachable by email or text.

New comment by mbowcut2 in "Claude Cowork exfiltrates files"

mbowcut2 — Thu, 15 Jan 2026 02:27:38 +0000

Wow, I didn't know about the "skills" feature, but with that as context isn't this attack strategy obvious? Running an unverified skill in Cowork is akin to running unverified code on your machine. The next super-genius attack vector will be something like: Claude Cowork deletes sytem32 when you give it root access and run the skill "brick_my_machine" /s.

New comment by mbowcut2 in "Mistral 3 family of models released"

mbowcut2 — Tue, 02 Dec 2025 17:46:04 +0000

It makes me wonder about the gaps in evaluating LLMs by benchmarks. There almost certainly is overfitting happening which could degrade other use cases. "In practice" evaluation is what inspired the Chatbot Arena right? But then people realized that Chatbot arena over-prioritizes formatting, and maybe sycophancy(?). Makes you wonder what the best evaluation would be. We probably need lots more task-specific models. That's seemed to be fruitful for improved coding.

New comment by mbowcut2 in "A small number of samples can poison LLMs of any size"

mbowcut2 — Thu, 09 Oct 2025 18:39:38 +0000

Seems like the less sexy headline is just something about the sample size needed for LLM fact encoding That's honestly a more interesting angle to me: How many instances of data X needs to be in the training data for the LLM to properly encode it? Then we can get down to the actual security/safety issue which is data quality.

New comment by mbowcut2 in "Top model scores may be skewed by Git history leaks in SWE-bench"

mbowcut2 — Thu, 11 Sep 2025 19:23:14 +0000

I'm not surprised. People really thought the models just kept getting better and better?

Bezier Curve

mbowcut2 — Mon, 11 Aug 2025 19:17:10 +0000

Article URL: https://javascript.info/bezier-curve

Comments URL: https://news.ycombinator.com/item?id=44868289

Points: 2

# Comments: 0

Magnus Carlsen Commentates Grok vs. OpenAI Finale [video]

mbowcut2 — Fri, 08 Aug 2025 17:35:39 +0000

Article URL: https://www.youtube.com/watch?v=vtHfJ6iYyEY

Comments URL: https://news.ycombinator.com/item?id=44839594

Points: 3

# Comments: 1

New comment by mbowcut2 in "GPT-5"

mbowcut2 — Thu, 07 Aug 2025 17:52:52 +0000

it looks like the 2nd and 3rd bar never got updated from the dummy data placeholders lol.

New comment by mbowcut2 in "Genie 3: A new frontier for world models"

mbowcut2 — Tue, 05 Aug 2025 16:26:12 +0000

It's not a new problem (for individuals), though perhaps at an unprecedented scale (so, maybe a new problem for civilization). I'm sure there were black smiths that felt they had lost their meaning when they were replaced by industrial manufacturing.

New comment by mbowcut2 in "Show HN: I built an AI that turns any book into a text adventure game"

mbowcut2 — Tue, 29 Jul 2025 18:55:22 +0000

I've had similar experiences with vanilla ChatGPT as a DM but I bet with clever prompt engineering and context window management you could solve or at least dramatically improve the experience. For example, you could have the model execute a planning step before your session in which it generates a plot outline, character list, story tree, etc. which could then be used for reference during the game session.

One problem that would probably still linger is model agreeableness, i.e. despite preparation, models have a tendency to say yes to whatever you ask for, and everybody knows a good DM needs to know when to say no.

New comment by mbowcut2 in "LLM Embeddings Explained: A Visual and Intuitive Guide"

mbowcut2 — Mon, 28 Jul 2025 17:39:46 +0000

You can, and there has been some interesting work done with it. The technique is called LogitLens, and basically you pass intermediate embeddings through the LMHead to get logits corresponding to tokens. In this paper they use it to investigate whether LLMs have a language bias, i.e. does GPT "think" in English? https://arxiv.org/pdf/2408.10811

One problem with this technique is that the model wasn't trained with intermediate layers being mapped to logits in the first place, so it's not clear why the LMHead should be able to map them to anything sensible. But alas, like everything in DL research, they threw science at the wall and a bit stuck.

New comment by mbowcut2 in "LLM Embeddings Explained: A Visual and Intuitive Guide"

mbowcut2 — Mon, 28 Jul 2025 15:03:38 +0000

The problem with embeddings is that they're basically inscrutable to anything but the model itself. It's true that they must encode the semantic meaning of the input sequence, but the learning process compresses it to the point that only the model's learned decoder head knows what to do with it. Anthropic's developed interpretable internal features for Sonnet 3 [1], but from what I understand that requires somewhat expensive parallel training of a network whose sole purpose is attempt to disentangle LLM hidden layer activations.

[1] https://transformer-circuits.pub/2024/scaling-monosemanticit...

New comment by mbowcut2 in "Quantitative AI progress needs accurate and transparent evaluation"

mbowcut2 — Fri, 25 Jul 2025 14:02:02 +0000

LLMs are better at LaTeX than humans. ChatGPT often writes LaTeX responses.

New comment by mbowcut2 in "The Tradeoffs of SSMs and Transformers"

mbowcut2 — Wed, 09 Jul 2025 00:19:48 +0000

I think I agree with you. My only rebuttal would be it's this kind of thinking that's kept any leading players form trying other architectures in the first place. As far as I know, SOTA for SSM's just doesn't suggest significant enough potential upsides warrant significant R&D. Not compared to the tried and true established LLM methods. The decision might be something like: "Pay X to train a competitive LLM" vs "Pay 2X to MAYBE train a competitive SSM".

New comment by mbowcut2 in "Honda conducts successful launch and landing of experimental reusable rocket"

mbowcut2 — Tue, 17 Jun 2025 19:24:13 +0000

I read this as "pirate space industry" and got real excited.

New comment by mbowcut2 in "Honda conducts successful launch and landing of experimental reusable rocket"

mbowcut2 — Tue, 17 Jun 2025 19:21:25 +0000

It's interesting how I couldn't tell whether the rocket was 1m tall or 10m tall in this video. Turns out it's actually 6m tall per the link.

New comment by mbowcut2 in "Apple Announces Foundation Models and Containerization frameworks, etc."

mbowcut2 — Mon, 09 Jun 2025 17:56:14 +0000

Nah, I think they made it model agnostic, which is kinda smart.