<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: monoid73</title><link>https://news.ycombinator.com/user?id=monoid73</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 17 Apr 2026 17:50:15 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=monoid73" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by monoid73 in "Show HN: Browser AI agent platform designed for reliability"]]></title><description><![CDATA[
<p>for the hybrid workflows, curious how do you decide which parts need AI reasoning vs can be hardcoded? is it adaptive or manual config?</p>
]]></description><pubDate>Thu, 07 Aug 2025 18:04:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=44828128</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=44828128</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44828128</guid></item><item><title><![CDATA[Show HN: Open Operator Evals – real-world benchmarks for LLM web agents]]></title><description><![CDATA[
<p>We’ve open-sourced a benchmark for LLM-driven web agent setups.<p>It evaluates real-world tasks, like logging in, scraping dashboards, and submitting forms, using structured criteria: success rate, latency, and task reliability.<p>Everything is fully reproducible, with all outputs, logs, and evaluation data available.<p><a href="https://github.com/nottelabs/open-operator-evals">https://github.com/nottelabs/open-operator-evals</a><p>Feedback, critiques, or contributions welcome:)</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44318333">https://news.ycombinator.com/item?id=44318333</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 19 Jun 2025 13:03:56 +0000</pubDate><link>https://github.com/nottelabs/open-operator-evals</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=44318333</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44318333</guid></item><item><title><![CDATA[New comment by monoid73 in "Void: Open-source Cursor alternative"]]></title><description><![CDATA[
<p>Another one? People saw that 3B windsurf money.</p>
]]></description><pubDate>Thu, 08 May 2025 18:26:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=43929546</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=43929546</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43929546</guid></item><item><title><![CDATA[New comment by monoid73 in "Ask HN: How to find a job as Java software developer in USA?"]]></title><description><![CDATA[
<p>think the visa hurdle is the big one. even if you have a strong background, a lot of companies hesitate unless they already have an immigration pipeline set up. another angle could be looking for remote roles at US companies first, then trying to convert that into relocation later. a bit longer path but sometimes more realistic. 
good luck.</p>
]]></description><pubDate>Sun, 27 Apr 2025 11:34:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=43811168</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=43811168</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43811168</guid></item><item><title><![CDATA[New comment by monoid73 in "Acquisitions, consolidation, and innovation in AI"]]></title><description><![CDATA[
<p>I think the UX of chatgpt works because it's familiar, not because it's good. Lowers friction for new users but doesn't scale well for more complex workflows. if you're building anything beyond Q&A or simple tasks, you run into limitations fast. There's still plenty of space for apps that treat the model as a backend and build real interaction layers on top — especially for use cases that aren’t served by a chat metaphor</p>
]]></description><pubDate>Thu, 24 Apr 2025 21:14:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=43787588</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=43787588</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43787588</guid></item><item><title><![CDATA[New comment by monoid73 in "AI is right about em-dashes"]]></title><description><![CDATA[
<p>funny enough, i started noticing em dashes mostly through using GPT. wasn’t really part of my writing before, but now i find them super useful for managing rhythm and flow. 
definitely earned their place — not because LLMs use them, but because they actually work.
(says ChatGPT in response to this post)</p>
]]></description><pubDate>Thu, 24 Apr 2025 21:11:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=43787567</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=43787567</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43787567</guid></item><item><title><![CDATA[New comment by monoid73 in "Teaching LLMs how to solid model"]]></title><description><![CDATA[
<p>this is one of the more compelling "LLM meets real-world tool" use cases i've seen. openSCAD makes a great testbed since it's text-based and deterministic, but i wonder what the limits are once you get into more complex assemblies or freeform surfacing.<p>curious if the real unlock long-term will come from hybrid workflows, LLMs proposing parameterized primitives, humans refining them in UI, then LLMs iterating on feedback. kind of like pair programming, but for CAD.</p>
]]></description><pubDate>Wed, 23 Apr 2025 20:30:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=43776331</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=43776331</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43776331</guid></item><item><title><![CDATA[New comment by monoid73 in "Can a single AI model advance any field of science?"]]></title><description><![CDATA[
<p>exactly. hindsight bias makes it really hard to separate genuine inference from subtle prompt leakage. even framing the question can accidentally steer it toward the right answer. would be interesting to try with completely synthetic problems first just to test the method.</p>
]]></description><pubDate>Tue, 22 Apr 2025 20:52:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=43766179</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=43766179</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43766179</guid></item><item><title><![CDATA[New comment by monoid73 in "Sapphire: Rust based package manager for macOS"]]></title><description><![CDATA[
<p>same here. brew’s been great historically but it’s gotten bloated and kinda slow. curious to see if sapphire can keep things lean without sacrificing compatibility.</p>
]]></description><pubDate>Tue, 22 Apr 2025 20:44:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=43766101</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=43766101</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43766101</guid></item><item><title><![CDATA[New comment by monoid73 in "Local LLM inference – impressive but too hard to work with"]]></title><description><![CDATA[
<p>yeah, that'd b nice, some kind of self-bootstrapping system where you start with a strong cloud model, then fine-tune a smaller local one over time until it’s good enough to take over. tricky part is managing quality drift and deciding when it's 'good enough' without tanking UX. edge hardware's catching up though, so feels more feasible by the day.</p>
]]></description><pubDate>Mon, 21 Apr 2025 22:21:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=43757145</link><dc:creator>monoid73</dc:creator><comments>https://news.ycombinator.com/item?id=43757145</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43757145</guid></item></channel></rss>