<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ghostlyy</title><link>https://news.ycombinator.com/user?id=ghostlyy</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 19 May 2026 01:53:37 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ghostlyy" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ghostlyy in "New agents.txt file found on DreamHost"]]></title><description><![CDATA[
<p>partial answer: the major labs (Anthropic, OpenAI) do respect robots.txt for their named crawlers, so blocking ClaudeBot/GPTBot in robots.txt works for those specific bots. What you can't easily opt out of is the indirect ingestion via Common Crawl, scraped datasets, and unnamed crawlers. agents.txt doesn't change that picture.
The Allow-Training vs Allow-RAG split in the default is the useful part of the file. They're different operations with different costs to the site owner. Training is a one-time bulk ingest. RAG is a runtime fetch per query. A site owner might reasonably allow one and not the other.</p>
]]></description><pubDate>Thu, 14 May 2026 16:17:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48137530</link><dc:creator>ghostlyy</dc:creator><comments>https://news.ycombinator.com/item?id=48137530</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48137530</guid></item><item><title><![CDATA[New comment by ghostlyy in "Sam Altman's Business Dealings Under GOP Scrutiny Ahead of OpenAI's IPO"]]></title><description><![CDATA[
<p>Timing's also worth nothing. the investments piece has been reported on for over a year. It becomes a probe right before liquidity, which makes both sides look opportunistic rather than principled.</p>
]]></description><pubDate>Thu, 14 May 2026 16:14:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=48137490</link><dc:creator>ghostlyy</dc:creator><comments>https://news.ycombinator.com/item?id=48137490</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48137490</guid></item></channel></rss>