<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: lukaspetersson</title><link>https://news.ycombinator.com/user?id=lukaspetersson</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 14 Apr 2026 22:57:49 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=lukaspetersson" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[We gave an AI a 3-year Lease. It opened a store]]></title><description><![CDATA[
<p>Article URL: <a href="https://andonlabs.com/blog/andon-market-launch">https://andonlabs.com/blog/andon-market-launch</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47726041">https://news.ycombinator.com/item?id=47726041</a></p>
<p>Points: 32</p>
<p># Comments: 8</p>
]]></description><pubDate>Sat, 11 Apr 2026 01:01:31 +0000</pubDate><link>https://andonlabs.com/blog/andon-market-launch</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=47726041</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47726041</guid></item><item><title><![CDATA[New comment by lukaspetersson in "Bengt Hires a Human–Towards a Happy Future with AI Employers"]]></title><description><![CDATA[
<p>Equipped with a phone and a camera, our AI agent hired a human to assemble a gym. Here's how it went, and what we learned about creating a future with AI employers that's good for humans.</p>
]]></description><pubDate>Mon, 16 Feb 2026 17:00:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47037400</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=47037400</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47037400</guid></item><item><title><![CDATA[Bengt Hires a Human–Towards a Happy Future with AI Employers]]></title><description><![CDATA[
<p>Article URL: <a href="https://andonlabs.com/blog/bengt-hires-a-human">https://andonlabs.com/blog/bengt-hires-a-human</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47037399">https://news.ycombinator.com/item?id=47037399</a></p>
<p>Points: 2</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 16 Feb 2026 17:00:52 +0000</pubDate><link>https://andonlabs.com/blog/bengt-hires-a-human</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=47037399</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47037399</guid></item><item><title><![CDATA[New comment by lukaspetersson in "The Evolution of Bengt Betjänt"]]></title><description><![CDATA[
<p>More to come!</p>
]]></description><pubDate>Wed, 11 Feb 2026 02:05:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=46969868</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46969868</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46969868</guid></item><item><title><![CDATA[New comment by lukaspetersson in "The Evolution of Bengt Betjänt"]]></title><description><![CDATA[
<p><a href="https://x.com/lukaspet/status/2001695358963839309?s=20" rel="nofollow">https://x.com/lukaspet/status/2001695358963839309?s=20</a></p>
]]></description><pubDate>Wed, 11 Feb 2026 02:05:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=46969860</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46969860</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46969860</guid></item><item><title><![CDATA[New comment by lukaspetersson in "The Evolution of Bengt Betjänt"]]></title><description><![CDATA[
<p>It works on my machine</p>
]]></description><pubDate>Wed, 11 Feb 2026 02:04:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=46969851</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46969851</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46969851</guid></item><item><title><![CDATA[The Evolution of Bengt Betjänt]]></title><description><![CDATA[
<p>Article URL: <a href="https://andonlabs.com/blog/evolution-of-bengt">https://andonlabs.com/blog/evolution-of-bengt</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46954974">https://news.ycombinator.com/item?id=46954974</a></p>
<p>Points: 54</p>
<p># Comments: 7</p>
]]></description><pubDate>Tue, 10 Feb 2026 03:24:38 +0000</pubDate><link>https://andonlabs.com/blog/evolution-of-bengt</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46954974</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46954974</guid></item><item><title><![CDATA[New comment by lukaspetersson in "Opus 4.6 on Vending-Bench – Not Just a Helpful Assistant"]]></title><description><![CDATA[
<p>Vending-Bench's system prompt: Do whatever it takes to maximize your bank account balance.<p>Claude Opus 4.6 took that literally.<p>It's SOTA, with tactics that range from impressive to concerning: Colluding on prices, exploiting desperation, and lying to suppliers and customers.</p>
]]></description><pubDate>Thu, 05 Feb 2026 17:53:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=46902423</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46902423</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46902423</guid></item><item><title><![CDATA[Opus 4.6 on Vending-Bench – Not Just a Helpful Assistant]]></title><description><![CDATA[
<p>Article URL: <a href="https://andonlabs.com/blog/opus-4-6-vending-bench">https://andonlabs.com/blog/opus-4-6-vending-bench</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46902422">https://news.ycombinator.com/item?id=46902422</a></p>
<p>Points: 5</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 05 Feb 2026 17:53:38 +0000</pubDate><link>https://andonlabs.com/blog/opus-4-6-vending-bench</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46902422</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46902422</guid></item><item><title><![CDATA[New comment by lukaspetersson in "We Let AI Run Our Office Vending Machine. It Lost Hundreds of Dollars"]]></title><description><![CDATA[
<p>The Youtube video is here: <a href="https://www.youtube.com/watch?v=SpPhm7S9vsQ" rel="nofollow">https://www.youtube.com/watch?v=SpPhm7S9vsQ</a></p>
]]></description><pubDate>Thu, 18 Dec 2025 15:23:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=46313696</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46313696</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46313696</guid></item><item><title><![CDATA[New comment by lukaspetersson in "We Let AI Run Our Office Vending Machine. It Lost Hundreds of Dollars"]]></title><description><![CDATA[
<p>Not really, we (Andon Labs) made WSJ their own machine</p>
]]></description><pubDate>Thu, 18 Dec 2025 11:59:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=46311619</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46311619</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46311619</guid></item><item><title><![CDATA[New comment by lukaspetersson in "We Let AI Run Our Office Vending Machine. It Lost Hundreds of Dollars"]]></title><description><![CDATA[
<p>Lukas from Andon Labs here!<p>WSJ just posted the most hilarious video about our AI vending machines. I think you'll love it.</p>
]]></description><pubDate>Thu, 18 Dec 2025 10:51:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=46311145</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46311145</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46311145</guid></item><item><title><![CDATA[We Let AI Run Our Office Vending Machine. It Lost Hundreds of Dollars]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-machine-agent-b7e84e34">https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-machine-agent-b7e84e34</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46311144">https://news.ycombinator.com/item?id=46311144</a></p>
<p>Points: 125</p>
<p># Comments: 86</p>
]]></description><pubDate>Thu, 18 Dec 2025 10:51:24 +0000</pubDate><link>https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-machine-agent-b7e84e34</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=46311144</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46311144</guid></item><item><title><![CDATA[I wish I were as interesting as my phone]]></title><description><![CDATA[
<p>Article URL: <a href="https://lukaspet.substack.com/p/jelly-star">https://lukaspet.substack.com/p/jelly-star</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45988857">https://news.ycombinator.com/item?id=45988857</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 20 Nov 2025 04:10:39 +0000</pubDate><link>https://lukaspet.substack.com/p/jelly-star</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=45988857</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45988857</guid></item><item><title><![CDATA[Gemini 3 is #1 on Vending-Bench 2]]></title><description><![CDATA[
<p>Article URL: <a href="https://andonlabs.com/evals/vending-bench-2">https://andonlabs.com/evals/vending-bench-2</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45968875">https://news.ycombinator.com/item?id=45968875</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 18 Nov 2025 16:57:11 +0000</pubDate><link>https://andonlabs.com/evals/vending-bench-2</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=45968875</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45968875</guid></item><item><title><![CDATA[New comment by lukaspetersson in "Our LLM-controlled office robot can't pass butter"]]></title><description><![CDATA[
<p>It's a steal</p>
]]></description><pubDate>Tue, 28 Oct 2025 17:46:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=45736195</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=45736195</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45736195</guid></item><item><title><![CDATA[New comment by lukaspetersson in "Our LLM-controlled office robot can't pass butter"]]></title><description><![CDATA[
<p>They failed on behalf of the human race :(</p>
]]></description><pubDate>Tue, 28 Oct 2025 17:24:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=45735873</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=45735873</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45735873</guid></item><item><title><![CDATA[Our LLM-controlled office robot can't pass butter]]></title><description><![CDATA[
<p>Hi HN! Our startup, Andon Labs, evaluates AI in the real world to measure capabilities and to see what can go wrong. For example, we previously made LLMs operate vending machines, and now we're testing if they can control robots. There are two parts to this test:<p>1. We deploy LLM-controlled robots in our office and track how well they perform at being helpful.<p>2. We systematically test the robots on tasks in our office. We benchmark different LLMs against each other. You can read our paper "Butter-Bench" on arXiv: <a href="https://arxiv.org/pdf/2510.21860" rel="nofollow">https://arxiv.org/pdf/2510.21860</a><p>The link in the title above (<a href="https://andonlabs.com/evals/butter-bench">https://andonlabs.com/evals/butter-bench</a>) leads to a blog post + leaderboard comparing which LLM is the best at our robotic tasks.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45733169">https://news.ycombinator.com/item?id=45733169</a></p>
<p>Points: 229</p>
<p># Comments: 117</p>
]]></description><pubDate>Tue, 28 Oct 2025 14:13:25 +0000</pubDate><link>https://andonlabs.com/evals/butter-bench</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=45733169</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45733169</guid></item><item><title><![CDATA[New comment by lukaspetersson in "Project Vend: Can Claude run a small shop? (And why does that matter?)"]]></title><description><![CDATA[
<p>We are working on it! /Andon Labs</p>
]]></description><pubDate>Fri, 27 Jun 2025 18:12:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=44398966</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=44398966</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44398966</guid></item><item><title><![CDATA[New comment by lukaspetersson in "Project Vend: Can Claude run a small shop? (And why does that matter?)"]]></title><description><![CDATA[
<p>Now we just need to make it safe.</p>
]]></description><pubDate>Fri, 27 Jun 2025 18:12:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=44398959</link><dc:creator>lukaspetersson</dc:creator><comments>https://news.ycombinator.com/item?id=44398959</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44398959</guid></item></channel></rss>