<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: midmost44</title><link>https://news.ycombinator.com/user?id=midmost44</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 15 Jun 2026 13:29:44 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=midmost44" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[The Car Wash Problem: A variable isolation study on prompt architecture]]></title><description><![CDATA[
<p>Most AI products inject facts and hope reasoning follows. But intelligence is not measured by how much a model holds in its context window.
It is measured by knowing to pick up the keys before leaving the house.<p>Last week, the "Car Wash problem" (50m away, walk or drive?) went viral here on HN. Every major LLM failed because they missed the implicit physical constraint: the car must be there.
While testing InterviewMate's prompt architecture, I posed the same question. It answered drive immediately. Every other LLM had failed. But I didn't actually know why it worked — so I ran a variable isolation study to find out.
100 API calls, Claude Sonnet 4.5, 5 conditions:<p>Baseline (no prompt): 0%
Role only: 0%
Context injection (user profile, car location): 30%
Structured reasoning (STAR framework): 85%
Full stack (both combined): 100%<p>Throwing facts at the model doesn't work unless the architecture forces it to explicitly evaluate the task goal first. Without structure, the model jumps straight to the distance heuristic: "100m is short, walk."
I'm writing a paper on this. Wanted to share the raw data with HN first.
Code and raw eval data: https://github.com/JO-HEEJIN/interview_mate/tree/main/car_wash</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47087746">https://news.ycombinator.com/item?id=47087746</a></p>
<p>Points: 2</p>
<p># Comments: 1</p>
]]></description><pubDate>Fri, 20 Feb 2026 13:23:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47087746</link><dc:creator>midmost44</dc:creator><comments>https://news.ycombinator.com/item?id=47087746</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47087746</guid></item><item><title><![CDATA[New comment by midmost44 in "TaskForge – auditable, secure, framework for OpenClaw"]]></title><description><![CDATA[
<p>WOW. how did you handle the token issue?</p>
]]></description><pubDate>Wed, 18 Feb 2026 09:27:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47059019</link><dc:creator>midmost44</dc:creator><comments>https://news.ycombinator.com/item?id=47059019</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47059019</guid></item><item><title><![CDATA[New comment by midmost44 in "Claude Sonnet 4.6"]]></title><description><![CDATA[
<p>I test API version. it beats opus 4. lol. I saved 5x money!!!</p>
]]></description><pubDate>Wed, 18 Feb 2026 09:26:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47059012</link><dc:creator>midmost44</dc:creator><comments>https://news.ycombinator.com/item?id=47059012</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47059012</guid></item></channel></rss>