<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: rapidata</title><link>https://news.ycombinator.com/user?id=rapidata</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 24 May 2026 21:59:26 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=rapidata" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by rapidata in "“Car Wash” test with 53 models"]]></title><description><![CDATA[
<p>it is surprising, but give this question to some random people on the street without context and you would be surprised</p>
]]></description><pubDate>Tue, 24 Feb 2026 09:48:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47135027</link><dc:creator>rapidata</dc:creator><comments>https://news.ycombinator.com/item?id=47135027</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47135027</guid></item><item><title><![CDATA[New comment by rapidata in "“Car Wash” test with 53 models"]]></title><description><![CDATA[
<p>All good ^^, its a fair point, we have come up with some fun ways to track peoples reliability over time. But the validation sets contain plenty of forced-choice questions, those that have an empirical true can be used directly to calculate a reliability, those that are subjective need to be re-asked after sometime to ensure consistency. People that don't pass thresholds would not be part of the 10'000 here.<p>But of course. If every human was told to take 3 minutes to deeply think about it and told that its a trick question, then they most likely will all get it right. But its the same with the LLMs, if you ask them like that they will get it right most of the time. The low effort is kinda the point here.</p>
]]></description><pubDate>Tue, 24 Feb 2026 09:48:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47135024</link><dc:creator>rapidata</dc:creator><comments>https://news.ycombinator.com/item?id=47135024</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47135024</guid></item><item><title><![CDATA[New comment by rapidata in "“Car Wash” test with 53 models"]]></title><description><![CDATA[
<p>All sorts, we go through third parties. But apps include stuff like Duo Lingo, Games, Sport Betting Apps ect. 
Its an optional opt in instead of watching ads or paying for the app. And obviously you are vetted that you don't spam.</p>
]]></description><pubDate>Tue, 24 Feb 2026 09:40:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47134974</link><dc:creator>rapidata</dc:creator><comments>https://news.ycombinator.com/item?id=47134974</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47134974</guid></item><item><title><![CDATA[New comment by rapidata in "“Car Wash” test with 53 models"]]></title><description><![CDATA[
<p>We were surprise ourselfes, but if you walk around and randomly ask people in the street, I think you would be surprised what you would find. Its a trick question.</p>
]]></description><pubDate>Mon, 23 Feb 2026 21:20:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47129032</link><dc:creator>rapidata</dc:creator><comments>https://news.ycombinator.com/item?id=47129032</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47129032</guid></item><item><title><![CDATA[New comment by rapidata in "“Car Wash” test with 53 models"]]></title><description><![CDATA[
<p>We try a bit harder than that my friend.</p>
]]></description><pubDate>Mon, 23 Feb 2026 21:18:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47129005</link><dc:creator>rapidata</dc:creator><comments>https://news.ycombinator.com/item?id=47129005</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47129005</guid></item></channel></rss>