Hacker News: redfloatplane

New comment by redfloatplane in "A desktop made for one"

redfloatplane — Mon, 04 May 2026 06:44:56 +0000

Great post, thanks for sharing. Evidently I am far behind the real thinkers! The Robin Sloan post mentioned is also very good.

New comment by redfloatplane in "A desktop made for one"

redfloatplane — Sun, 03 May 2026 17:53:44 +0000

I (and I'm sure many others) have been thinking about this a lot over the last couple of months. I called it "Extremely Personal Software" in a blog post a few months ago (https://redfloatplane.lol/blog/14-releasing-software-now/) but there are lots of names and concepts floating about for the same basic idea.

I think it's possible the amount of new software that will be written for an audience of 1-10 will be greater in 2026 than in any previous year, and then the same again for many years to come. I also think a lot of this software will be essentially 'hidden' - people just writing this stuff for themselves because the cost to say things to an agent is very low compared with the cost of actually planning out a software design and so forth.

Interoperability will probably be important in the next few years and I wonder if this is something solvable at the agent/LLM level (standing instructions like 'typically, use sqlite, use plaintext, use open standards' or whatever). I also think observability and ops will be pretty important - many people who want personal software but don't care for the maintenance and upkeep.

New comment by redfloatplane in "Porsche will contest Laguna Seca in historic colors of the Apple Computer livery"

redfloatplane — Sun, 03 May 2026 17:41:47 +0000

Ah, they just today did Formula E in a Pink Pig livery (https://www.fiaformulae.com/en/news/1062899) but I think the Apple livery might have been more apt.

New comment by redfloatplane in "Withnail's Coat and I"

redfloatplane — Wed, 29 Apr 2026 11:59:22 +0000

Maybe I meant that the amount of detail is sustained no matter how close you look? Maybe I was careless with my words? This is unnecessarily pedantic. I enjoyed the article. See you another time, CyberDildonics

New comment by redfloatplane in "Withnail's Coat and I"

redfloatplane — Wed, 29 Apr 2026 11:12:44 +0000

Did you read the article? It's entirely about a concrete artefact from that old movie, down to the kind of tweed, now made by only six people in Scotland. I'm not sure how you come to this response.

New comment by redfloatplane in "Withnail's Coat and I"

redfloatplane — Wed, 29 Apr 2026 08:13:00 +0000

I love it when this kind of thing surfaces on HN. It’s always so enjoyable to have the fractal nature of detail in the world shown to you. Really nice to read as well.

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 20:38:52 +0000

Assuming they would understand it as artificial - I think many people would think it's a human intelligence in a cyborg trenchcoat, and it would be hard to convince people it wasn't literally a guy named Claude who was an incredibly fast typist who had a million pre-cached templated answers for things.

But in general, yeah, I agree, I think they would think it was a sentient, conscious, emotional being. And then the question is - why do we not think that now?

As I said, I don't have a particularly strong opinion, but it's very interesting (and fun!) to think about.

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 20:33:42 +0000

Hmm, it's been a long time since I watched it. I was thinking more about first contact sci-fi mostly, but Ex Machina is certainly quite prescient. It's also Blade Runner I guess.

In general I was wondering about what I would have thought seeing Claude today side-by-side with the original ChatGPT, and then going back further to GPT-2 or BERT (which I used to generate stochastic 'poetry' back in 2019). And then… what about before? Markov chains? How far back do I need to go where it flips from thinking that it's "impressive but technically explainable emergent behaviour of a computer program" to "this is a sentient being". 1991 is probably too far, I'd say maybe pre-Matrix 1999 is a good point, but that depends on a lot of cultural priors and so on as well.

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 20:16:53 +0000

Thanks, I find it very interesting as well. I think very many people would assume they must be interacting with another person, and I don't think there's really a way to _prove_ it's not that, just through conversation. But we do have a lot of mechanisms for understanding how others think through conversation only, and so I think the approach of having a clinical psychiatrist interact with the model make sense.

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 20:00:17 +0000

Thanks for sharing that talk, enjoyed watching it!

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 19:26:56 +0000

You said: "I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?" and they said, paraphrasing, "We reached out and talked to biologists and asked them to rank the model between 0 and 4 where 4 is a world expert, and the median people said it was a 2, which was that it helped them save time in the way a capable colleague would" specifically "Specific, actionable info; saves expert meaningful time; fills gaps in adjacent domains"

so I'm just telling you they did the thing you said you wanted.

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 19:18:25 +0000

A thought experiment: It's April, 1991. Magically, some interface to Claude materialises in London. Do you think most people would think it was a sentient life form? How much do you think the interface matters - what if it looks like an android, or like a horse, or like a large bug, or a keyboard on wheels?

I don't come down particularly hard on either side of the model sapience discussion, but I don't think dismissing either direction out of hand is the right call.

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 19:11:25 +0000

> I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?

Well, I would say they have done precisely that in evaluating the model, no? For example section 2.2.5.1:

>Uplift and feasibility results

>The median expert assessed the model as a force-multiplier that saves meaningful time (uplift level 2 of 4), with only two biology experts rating it comparable to consulting a knowledgeable specialist (level 3). No expert assigned the highest rating. Most experts were able to iterate with the model toward a plan they judged as having only narrow gaps, but feasibility scores reflected that substantial outside expertise remained necessary to close them.

Other similar examples also in the system card

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 18:50:36 +0000

Yeah, good point, thanks for noting that, I'll correct.

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 18:48:59 +0000

There's been a section on this in nearly every system card anthropic has published so this isn't a new thing - and, this model doesn't have particularly higher risk than past models either:

> 2.1.3.2 On chemical and biological risks

> We believe that Mythos Preview does not pass this threshold due to its noted limitations in open-ended scientific reasoning, strategic judgment, and hypothesis triage. As such, we consider the uplift of threat actors without the ability to develop such weapons to be limited (with uncertainty about the extent to which weapons development by threat actors with existing expertise may be accelerated), even if we were to release the model for general availability. The overall picture is similar to the one from our most recent Risk Report.

New comment by redfloatplane in "Project Glasswing: Securing critical software for the AI era"

redfloatplane — Tue, 07 Apr 2026 18:29:50 +0000

The system card for Claude Mythos (PDF): https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]

I'm still reading the system card but here's a little highlight:

> Early indications in the training of Claude Mythos Preview suggested that the model was likely to have very strong general capabilities. We were sufficiently concerned about the potential risks of such a model that, for the first time, we arranged a 24-hour period of internal alignment review (discussed in the alignment assessment) before deploying an early version of the model for widespread internal use. This was in order to gain assurance against the model causing damage when interacting with internal infrastructure.

and interestingly:

> To be explicit, the decision not to make this model generally available does _not_ stem from Responsible Scaling Policy requirements.

Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.

Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...

The threat model in question:

> An AI model with access to powerful affordances within an organization could use its affordances to autonomously exploit, manipulate, or tamper with that organization’s systems or decision-making in a way that raises the risk of future significantly harmful outcomes (e.g. by altering the results of AI safety research).

New comment by redfloatplane in "Sci-Fi Short Film “There Is No Antimemetics Division” [video]"

redfloatplane — Tue, 17 Mar 2026 16:52:28 +0000

Thanks, that’s the answer I was looking for!

New comment by redfloatplane in "Sci-Fi Short Film “There Is No Antimemetics Division” [video]"

redfloatplane — Tue, 17 Mar 2026 14:25:13 +0000

I wonder did you read the re-release or the original release. I believe it was recently re-released with a bit of an editing pass, but I haven't read that version myself. I just recently reread Fine Structure and it definitely had a strong sense of being written sequentially, one chapter after another, and (very) lightly edited after the fact. I'd recommend Valuable Humans in Transit for a short story collection by the same author which works a bit better for me. Moved on to Exhalation by Ted Chiang which is also a very good short story collection. And just in general, I want to recommend Clarkesworld: https://clarkesworldmagazine.com

New comment by redfloatplane in "Reverse-engineering Viktor and making it open source"

redfloatplane — Tue, 17 Mar 2026 11:37:45 +0000

Ah, you're right. Headquartered in Delaware. Oh well. Thanks for spotting!

New comment by redfloatplane in "Reverse-engineering Viktor and making it open source"

redfloatplane — Tue, 17 Mar 2026 11:12:51 +0000

It's not completely clear that this is the original source. According to the post it's a reimplementation based on documentation created from the original source, or perhaps from developer documentation and the SDK. Whether that's the same thing from a legal standpoint, I don't really know - I think from a personal morality standpoint it's clear that they are the same thing.