New comment by ghostlyy in "New agents.txt file found on DreamHost"

ghostlyy — Thu, 14 May 2026 16:17:49 +0000

partial answer: the major labs (Anthropic, OpenAI) do respect robots.txt for their named crawlers, so blocking ClaudeBot/GPTBot in robots.txt works for those specific bots. What you can't easily opt out of is the indirect ingestion via Common Crawl, scraped datasets, and unnamed crawlers. agents.txt doesn't change that picture. The Allow-Training vs Allow-RAG split in the default is the useful part of the file. They're different operations with different costs to the site owner. Training is a one-time bulk ingest. RAG is a runtime fetch per query. A site owner might reasonably allow one and not the other.

New comment by ghostlyy in "Sam Altman's Business Dealings Under GOP Scrutiny Ahead of OpenAI's IPO"

ghostlyy — Thu, 14 May 2026 16:14:18 +0000

Timing's also worth nothing. the investments piece has been reported on for over a year. It becomes a probe right before liquidity, which makes both sides look opportunistic rather than principled.

Hacker News: ghostlyy

New comment by ghostlyy in "New agents.txt file found on DreamHost"

New comment by ghostlyy in "Sam Altman's Business Dealings Under GOP Scrutiny Ahead of OpenAI's IPO"