<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: someguy101010</title><link>https://news.ycombinator.com/user?id=someguy101010</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 15 May 2026 16:17:35 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=someguy101010" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by someguy101010 in "Codex is now in the ChatGPT mobile app"]]></title><description><![CDATA[
<p>if i didn't have to prompt it to learn from its mistakes and it just "intuitively" knew to do that</p>
]]></description><pubDate>Fri, 15 May 2026 03:18:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=48144152</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=48144152</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48144152</guid></item><item><title><![CDATA[Slowing Down My Coding Agents to Get More Done]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.robw.fyi/2026/05/11/slowing-down-my-coding-agents-to-get-more-done/">https://www.robw.fyi/2026/05/11/slowing-down-my-coding-agents-to-get-more-done/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48122557">https://news.ycombinator.com/item?id=48122557</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 13 May 2026 14:39:01 +0000</pubDate><link>https://www.robw.fyi/2026/05/11/slowing-down-my-coding-agents-to-get-more-done/</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=48122557</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48122557</guid></item><item><title><![CDATA[Coloring Code: How Compilers Use Graph Theory [video]]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.youtube.com/watch?v=K3mi2m7ccDQ">https://www.youtube.com/watch?v=K3mi2m7ccDQ</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47949610">https://news.ycombinator.com/item?id=47949610</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 29 Apr 2026 15:15:40 +0000</pubDate><link>https://www.youtube.com/watch?v=K3mi2m7ccDQ</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=47949610</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47949610</guid></item><item><title><![CDATA[New comment by someguy101010 in "Launch HN: Vela (YC W26) – AI for complex scheduling"]]></title><description><![CDATA[
<p>have built in this space which led me to develop a minizinc mcp server [0] for scheduling bocce tournaments [1]. scheduling with constraints is a np hard problem and it makes sense people struggle. tools exist to solve this problem but they are complex and hard to use for non technical folks, and even technical folks. am hoping a tool like this can bridge the gap and would like to bring it to your awareness if you aren't already thinking about the problem this way :)<p>edit: after reading a bit more of description looks like yall are taking a similar approach, kudos!<p>[0] <a href="https://github.com/r33drichards/minizinc-mcp" rel="nofollow">https://github.com/r33drichards/minizinc-mcp</a><p>[1] <a href="https://github.com/r33drichards/bocce-scheduler" rel="nofollow">https://github.com/r33drichards/bocce-scheduler</a></p>
]]></description><pubDate>Thu, 05 Mar 2026 18:29:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47265313</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=47265313</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47265313</guid></item><item><title><![CDATA[New comment by someguy101010 in "When does MCP make sense vs CLI?"]]></title><description><![CDATA[
<p>yep! thats the motivation behind <a href="https://github.com/r33drichards/mcp-js" rel="nofollow">https://github.com/r33drichards/mcp-js</a><p>I want to be able to give agents access to computation in a secure way without giving them full access to a computer</p>
]]></description><pubDate>Sun, 01 Mar 2026 19:08:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47209652</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=47209652</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47209652</guid></item><item><title><![CDATA[New comment by someguy101010 in "Show HN: A MitM proxy to see what your LLM tools are sending"]]></title><description><![CDATA[
<p>Does this support bedrock?</p>
]]></description><pubDate>Thu, 29 Jan 2026 02:01:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=46804762</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46804762</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46804762</guid></item><item><title><![CDATA[Show HN: Cua-Bench – a benchmark for AI agents in GUI environments]]></title><description><![CDATA[
<p>Hey HN, we're excited to share Cua-Bench ( <a href="https://github.com/trycua/cua" rel="nofollow">https://github.com/trycua/cua</a> ), an open-source framework for evaluating and training computer-use agents across different environments.<p>Computer-use agents show massive performance variance across different UIs—an agent with 90% success on Windows 11 might drop to 9% on Windows XP for the same task. The problem is OS themes, browser versions, and UI variations that existing benchmarks don't capture.<p>The existing benchmarks (OSWorld, Windows Agent Arena, AndroidWorld) were great but operated in silos—different harnesses, different formats, no standardized way to test the same agent across platforms. More importantly, they were evaluation-only. We needed environments that could generate training data and run RL loops, not just measure performance.
Cua-Bench takes a different approach: it's a unified framework that standardizes environments across platforms and supports the full agent development lifecycle—benchmark, train, deploy.<p>With Cua-Bench, you can:<p>- Evaluate agents across multiple benchmarks with one CLI (native tasks + OSWorld + Windows Agent Arena adapters)<p>- Test the same agent on different OS variations (Windows 11/XP/Vista, macOS themes, Linux, Android via QEMU)<p>- Generate new tasks from natural language prompts<p>- Create simulated environments for RL training (shell apps like Spotify, Slack with programmatic rewards)<p>- Run oracle validations to verify environments before agent evaluation<p>- Monitor agent runs in real-time with traces and screenshots<p>All of this works on macOS, Linux, Windows, and Android, and is self-hostable.<p>To get started:<p>Install cua-bench:<p>% pip install cua-bench<p>Run a basic evaluation:<p>% cb run dataset datasets/cua-bench-basic --agent demo<p>Open the monitoring dashboard:<p>% cb run watch <run_id><p>For parallelized evaluations across multiple workers:<p>% cb run dataset datasets/cua-bench-basic --agent your-agent --max-parallel 8<p>Want to test across different OS variations? Just specify the environment:<p>% cb run task slack_message --agent your-agent --env windows_xp<p>% cb run task slack_message --agent your-agent --env macos_sonoma<p>Generate new tasks from prompts:<p>% cb task generate "book a flight on kayak.com"<p>Validate environments with oracle implementations:<p>% cb run dataset datasets/cua-bench-basic --oracle<p>The simulated environments are particularly useful for RL training—they're HTML/JS apps that render across 10+ OS themes with programmatic reward verification. No need to spin up actual VMs for training loops.<p>We're seeing teams use Cua-Bench for:<p>- Training computer-use models on mobile and desktop environments<p>- Generating large-scale training datasets (working with labs on millions of screenshots across OS variations)<p>- RL fine-tuning with shell app simulators<p>- Systematic evaluation across OS themes and browser versions<p>- Building task registries (collaborating with Snorkel AI on task design and data curation, similar to their Terminal-Bench work)<p>Cua-Bench is 100% open-source under the MIT license. We're actively developing it as part of Cua (<a href="https://github.com/trycua/cua" rel="nofollow">https://github.com/trycua/cua</a>), our Computer Use Agent SDK, and we'd love your feedback, bug reports, or feature ideas.<p>GitHub: <a href="https://github.com/trycua/cua" rel="nofollow">https://github.com/trycua/cua</a><p>Docs: <a href="https://cua.ai/docs/cuabench">https://cua.ai/docs/cuabench</a><p>Technical Report: <a href="https://cuabench.ai" rel="nofollow">https://cuabench.ai</a><p>We'll be here to answer any technical questions and look forward to your comments!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46768906">https://news.ycombinator.com/item?id=46768906</a></p>
<p>Points: 40</p>
<p># Comments: 8</p>
]]></description><pubDate>Mon, 26 Jan 2026 17:46:22 +0000</pubDate><link>https://github.com/trycua/cua</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46768906</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46768906</guid></item><item><title><![CDATA[Solve Hi-Q with AlphaZero and Curriculum Learning]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.robw.fyi/2025/12/28/solve-hi-q-with-alphazero-and-curriculum-learning/">https://www.robw.fyi/2025/12/28/solve-hi-q-with-alphazero-and-curriculum-learning/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46420882">https://news.ycombinator.com/item?id=46420882</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 29 Dec 2025 14:06:33 +0000</pubDate><link>https://www.robw.fyi/2025/12/28/solve-hi-q-with-alphazero-and-curriculum-learning/</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46420882</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46420882</guid></item><item><title><![CDATA[New comment by someguy101010 in "Skills for organizations, partners, the ecosystem"]]></title><description><![CDATA[
<p>Is it possible to provide a llm a skill through the mcp resource feature?</p>
]]></description><pubDate>Thu, 18 Dec 2025 17:57:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=46316150</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46316150</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46316150</guid></item><item><title><![CDATA[New comment by someguy101010 in "Why Windows XP is the ultimate AI benchmark"]]></title><description><![CDATA[
<p>as an infrastructure engineer the idea of being able to train computer use agents without provisioning infrastructure sounds amazing!<p>a common use case i run into is i want to be able to configure corporate vpn software on windows machines. is there a link for a getting started guide i could try this out with?</p>
]]></description><pubDate>Tue, 16 Dec 2025 16:31:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=46290595</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46290595</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46290595</guid></item><item><title><![CDATA[New comment by someguy101010 in "Dagger: Define software delivery workflows and dev environments"]]></title><description><![CDATA[
<p>have used it, and i do like it, but the licensing situation is not great. It open source but its not free software by any means.</p>
]]></description><pubDate>Sun, 14 Dec 2025 14:30:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=46263256</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46263256</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46263256</guid></item><item><title><![CDATA[New comment by someguy101010 in "The "confident idiot" problem: Why AI needs hard rules, not vibe checks"]]></title><description><![CDATA[
<p>wrote about this a bit too in <a href="https://www.robw.fyi/2025/10/24/simple-control-flow-for-automatically-steering-agents/" rel="nofollow">https://www.robw.fyi/2025/10/24/simple-control-flow-for-auto...</a><p>ran into this when writing agents to fix unit tests. often times they would just give up early so i started writing the verifiers directly into the agent's control flow and this produced much more reliable results. i believe claude code has hooks that do something similar as well.</p>
]]></description><pubDate>Mon, 08 Dec 2025 14:06:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=46192344</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46192344</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46192344</guid></item><item><title><![CDATA[New comment by someguy101010 in "Isn't WSL2 just a VM?"]]></title><description><![CDATA[
<p>clearly you have never worked in <i>enterprise</i></p>
]]></description><pubDate>Mon, 01 Dec 2025 20:01:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=46112373</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46112373</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46112373</guid></item><item><title><![CDATA[New comment by someguy101010 in "Ghostty compiled to WASM with xterm.js API compatibility"]]></title><description><![CDATA[
<p>nice one kyle! you could add <a href="https://github.com/wasmerio/webassembly.sh" rel="nofollow">https://github.com/wasmerio/webassembly.sh</a> and have a fully featured in browser shell with support for installing packages!</p>
]]></description><pubDate>Mon, 01 Dec 2025 19:16:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=46111726</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46111726</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46111726</guid></item><item><title><![CDATA[New comment by someguy101010 in "The Thinking Game Film – Google DeepMind documentary"]]></title><description><![CDATA[
<p>reposting this from youtube comment<p>From 1:14:55-1:15:20, within the span of 25 seconds, the way Demis spoke about releasing all known sequences without a shred of doubt was so amazing to see.  There wasn't a single second where he worried about the business side of it (profits, earnings, shareholders, investors) —he just knew it had to be open source for the betterment of the world.  
Gave me goosebumps. I watched that on repeat for more than 10 times.</p>
]]></description><pubDate>Sun, 30 Nov 2025 18:44:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=46099204</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=46099204</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46099204</guid></item><item><title><![CDATA[Simple Control Flow for Automatically Steering Agents]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.robw.fyi/2025/10/24/simple-control-flow-for-automatically-steering-agents/">https://www.robw.fyi/2025/10/24/simple-control-flow-for-automatically-steering-agents/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45713265">https://news.ycombinator.com/item?id=45713265</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sun, 26 Oct 2025 16:48:16 +0000</pubDate><link>https://www.robw.fyi/2025/10/24/simple-control-flow-for-automatically-steering-agents/</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=45713265</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45713265</guid></item><item><title><![CDATA[New comment by someguy101010 in "Constraint satisfaction to optimize item selection for bundles in Minecraft"]]></title><description><![CDATA[
<p>I opt for the greedy strategy in most game play scenarios for pretty much the reasons you described here. I was considering making a mod to perform this action for me and was looking for a more "correct" solution but greedy is way simpler and just as effective for most cases.</p>
]]></description><pubDate>Sun, 12 Oct 2025 23:19:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=45562963</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=45562963</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45562963</guid></item><item><title><![CDATA[New comment by someguy101010 in "Constraint satisfaction to optimize item selection for bundles in Minecraft"]]></title><description><![CDATA[
<p>Thanks for catching this!</p>
]]></description><pubDate>Sun, 12 Oct 2025 22:22:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=45562558</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=45562558</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45562558</guid></item><item><title><![CDATA[Constraint satisfaction to optimize item selection for bundles in Minecraft]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.robw.fyi/2025/10/12/using-constraint-satisfaction-to-optimize-item-selection-for-bundles-in-minecraft/">https://www.robw.fyi/2025/10/12/using-constraint-satisfaction-to-optimize-item-selection-for-bundles-in-minecraft/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45560535">https://news.ycombinator.com/item?id=45560535</a></p>
<p>Points: 41</p>
<p># Comments: 11</p>
]]></description><pubDate>Sun, 12 Oct 2025 18:31:51 +0000</pubDate><link>https://www.robw.fyi/2025/10/12/using-constraint-satisfaction-to-optimize-item-selection-for-bundles-in-minecraft/</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=45560535</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45560535</guid></item><item><title><![CDATA[New comment by someguy101010 in "Kitten TTS: 25MB CPU-Only, Open-Source Voice Model"]]></title><description><![CDATA[
<p>if the people who develop and release these models were all optimizing for the same goals, they could converge on strategies or behaviors, without coordinating.</p>
]]></description><pubDate>Wed, 06 Aug 2025 02:49:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=44807080</link><dc:creator>someguy101010</dc:creator><comments>https://news.ycombinator.com/item?id=44807080</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44807080</guid></item></channel></rss>