<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: yurimo</title><link>https://news.ycombinator.com/user?id=yurimo</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 10 Jun 2026 07:52:11 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=yurimo" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by yurimo in "Harness engineering: Leveraging Codex in an agent-first world"]]></title><description><![CDATA[
<p>I personally don't view coding agents making software as "software gotten better" you are comparing a tool and the end result, these are two different things. Agent you use going down and your product going down mean two different things to you customers. I will not deny that we made incredible progress in coding and hell, even design over the past 3.5 years, this technology is here to stay.<p>That being said while I agree that measuring better quality of software is vague (part of the reason it is hard for models as well), there are universal things I believe every engineer will agree on. Reliability, uptime, customer feedback, legibility of your engineering, performance, these are things we often optimized for. Google Maps is a bit of a strawman because neither of us (unless you work on it), knows how much agent code there is, I think it is likely that it's little since it was working fine prior to 2023. I could bring up github reliability as an example, given how much copilot usage they promote at MS, but once again only folks there know for certain. I do, however, see scores of various AI powered SAAS that looks like it is in a perpetual MVP state. I think you are right in that even if agents give us "good enough" results and we can swallow failure rates and our increasingly lesser understanding of what we, or more so model, created, then it is still progress overall, but this is progress not to human-AI collaboration but to AI-only engineering IMO, this is good or bad depending on how you view the future.<p>I'm a scientist and most of code I currently write is somewhere on the intersection of critical software and machine learning, squaring these two is not easy and I guess the way I was taught to reason about engineering informs my opinions on this. Maybe it's just a matter of time before codex can help here in an unconstrained manner as well, but I am skeptical at the moment.</p>
]]></description><pubDate>Sun, 07 Jun 2026 07:02:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=48432563</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=48432563</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48432563</guid></item><item><title><![CDATA[New comment by yurimo in "Harness engineering: Leveraging Codex in an agent-first world"]]></title><description><![CDATA[
<p>You can't really do that here because one of the key arguments for this, as people in the thread focus on is "1/10th of time" estimate, the comparison with humans is here already, albeit it is just an estimate and no actual comparison has been done.<p>This is a problem of conflicting incentives that exists today in my opinion. Companies will market greater human-AI collaboration in science and engineering but focus on releasing things like this where it is clear that downstream goal is complete agent ownership over the product, from inception to testing to monitoring. Maybe the speculative future agents will use their own very efficient language to code that won't be readable for people at all. They focus on agent code being readable by agent in the article, as you've said. But in my mind in at least near future, there is a case where your prod will break, you won't be able to understand it or the attempted fixes. Maybe agent will fail to fix it at all and start a massive rewrite. In any case is this different from kicking technical debt down the road along with worse interpretability of what you have built?<p>I do think there is a way where agent can write great solid code that we can read, but with the way LLMs are built this requires something new in terms of reward that accounts for "taste" and constant refinement so it might take more than 1/10th of a time to produce something good.</p>
]]></description><pubDate>Sun, 07 Jun 2026 05:51:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=48432217</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=48432217</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48432217</guid></item><item><title><![CDATA[New comment by yurimo in "Harness engineering: Leveraging Codex in an agent-first world"]]></title><description><![CDATA[
<p>I think the telling part is in this line:<p>> Because the repository is entirely agent-generated, it’s optimized first for Codex’s legibility<p>I asked a question from a perspective of a human engineer, as in, I will have to read the code and understand, fix it once it breaks. OpenAI approach is opposite, even if it is breaking it is the agent that will be doing the fixing, millions of lines and inelegant designs don't matter because human readability doesn't matter. In any case you use more tokens so you fork over more money.<p>I will say, however, that IMHO there is objectively bad and good code in terms what it can do and performance, if I can do the same thing in 50 lines as opposed to 1000 lines, this difference still matters for the model. Smaller context usage, better approach that informs downstream generation.</p>
]]></description><pubDate>Sun, 07 Jun 2026 05:32:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=48432116</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=48432116</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48432116</guid></item><item><title><![CDATA[New comment by yurimo in "Harness engineering: Leveraging Codex in an agent-first world"]]></title><description><![CDATA[
<p>What I still can't understand is why is massive amount of code generated is a flex? I don't feel that software has gotten a lot better in past 3 years, only sloppier. It's surprising to me that people who know about reward hacking choose a simple objective like lines of code generated as a signal for quality. I'd argue you have to optimize for less lines generated as possible while secondary optimization should be readability for humans. I suspect it's not seen as a problem by providers because more lines generated means more tokens used and hence more billing put out on customers.<p>And if I am working on an existing codebase then isn't a good commit often a negative sum between added and removed lines? I don't want to bloat my codebase but make it more polished and elegant. After reading that I wonder if what they have done could have been accomplished for a far fewer LoC budget.</p>
]]></description><pubDate>Sun, 07 Jun 2026 02:25:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48431161</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=48431161</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48431161</guid></item><item><title><![CDATA[New comment by yurimo in "PyTorch Landscape"]]></title><description><![CDATA[
<p>Great stuff, thanks for posting.</p>
]]></description><pubDate>Tue, 19 May 2026 07:29:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48190352</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=48190352</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48190352</guid></item><item><title><![CDATA[New comment by yurimo in "Slop is not necessarily the future"]]></title><description><![CDATA[
<p>> No one has ever made a purchasing decision based on how good your code is.<p>When I make a purchasing decision I expect the payment to go trough quickly and correctly and for whatever I purchase to arrive to me in reasonable time. All of this rests on the reputation of software being solid. If a user hears a whiff of purchase not being executed correctly, money or goods going somewhere else, this is the death sentence for your company.<p>Industry is now pushing for agentic web where agents can do this on your behalf. But if we have slop foundations and then add unstable models that can hallucinate and make mistakes on top of it, then it's just a recipe for catastrophe. I think relegating 2) into category of only mission critical software escapes the reality of how much reliability goes into everyday services people use.</p>
]]></description><pubDate>Wed, 01 Apr 2026 04:58:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47596998</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=47596998</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47596998</guid></item><item><title><![CDATA[New comment by yurimo in "HyperAgents: Self-referential self-improving agents"]]></title><description><![CDATA[
<p>Sigh, as someone who does research in this area, this paper and its promotion on X has so many hype terms it is almost off-putting. If you read the paper what they are doing is trying to modify the scaffolding around a frozen FM until they get something better. None of this obviously includes any training (change to weights) or the underlying architecture. Even for scaffolding, a lot is still human-scaffolded: the outer loop (parent selection, evaluation protocol, task distribution) is mostly fixed. They experimented with editing parent selection and it rediscovers heuristics like UCB/softmax, but doesn’t yet beat handcrafted versions, so a lot of metrics are incremental, which is okay, that is what research is often. But it's not like a run away self-improvement or "improve forever" that people spin online.<p>It is an extension of their DGM paper.
Also it's ~88M+ tokens per full run I think, not surprising as any sort of exploratory search is expensive and I commend them for releasing the code online because it pushes this small subfield. But people need to temper their expectations. IMO the best part is a nice transfer between improvement objectives after exhaustive iteration that they found. I am wondering if what we have here is a way to exhaust local search space, by letting the model better express it.<p>On a separate one thing I think a lot about is whether these unchecked hyped claims and terms and marketing of papers actually does more bad than good to the field by setting expectations that cannot be delivered and distracting from the actual hard and unsexy nature of problems that need to be solved.</p>
]]></description><pubDate>Fri, 27 Mar 2026 06:02:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=47539504</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=47539504</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47539504</guid></item><item><title><![CDATA[New comment by yurimo in "Google's year in review: areas with research breakthroughs in 2025"]]></title><description><![CDATA[
<p>Quanta? They do recaps by field every year. Have been a big fan for a while.</p>
]]></description><pubDate>Wed, 24 Dec 2025 12:47:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=46375120</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=46375120</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46375120</guid></item><item><title><![CDATA[Neo the Home Robot [video]]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.youtube.com/watch?v=LTYMWadOW7c">https://www.youtube.com/watch?v=LTYMWadOW7c</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45738102">https://news.ycombinator.com/item?id=45738102</a></p>
<p>Points: 7</p>
<p># Comments: 1</p>
]]></description><pubDate>Tue, 28 Oct 2025 19:49:17 +0000</pubDate><link>https://www.youtube.com/watch?v=LTYMWadOW7c</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=45738102</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45738102</guid></item><item><title><![CDATA[New comment by yurimo in "Coral NPU: A full-stack platform for Edge AI"]]></title><description><![CDATA[
<p>I'm guessing you can still find Coral TPU-based boards somewhere but not sure what the support for these will be now that the focus is shifting. Coral TPU also uses subset of tensorflow and its nice to see that the open standard is targeting jax and torch.<p>When I went to see if anyone is selling the boards or their "Partners" page regarding manufacturing design I got 404 even after signing in: <a href="https://developers.google.com/coral/guides/coral/resource" rel="nofollow">https://developers.google.com/coral/guides/coral/resource</a></p>
]]></description><pubDate>Sun, 19 Oct 2025 03:38:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=45631931</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=45631931</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45631931</guid></item><item><title><![CDATA[New comment by yurimo in "iRobot Founder: Don't Believe the AI and Robotics Hype"]]></title><description><![CDATA[
<p>Rodney Brooks is widely recognized and celebrated roboticist, ran CSAIL for a while. iRobot as a company created a new market and managed to put a functional household robot out there, whether Chinese alternatives ate the share of it is largely irrelevant to his argument on humanoids, which I find to be completely reasonable.</p>
]]></description><pubDate>Tue, 30 Sep 2025 02:05:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=45421179</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=45421179</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45421179</guid></item><item><title><![CDATA[Stationery – Japanology Plus [video]]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.youtube.com/watch?v=_I-5EggKDyo">https://www.youtube.com/watch?v=_I-5EggKDyo</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45329651">https://news.ycombinator.com/item?id=45329651</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 22 Sep 2025 06:13:25 +0000</pubDate><link>https://www.youtube.com/watch?v=_I-5EggKDyo</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=45329651</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45329651</guid></item><item><title><![CDATA[New comment by yurimo in "Enough AI copilots, we need AI HUDs"]]></title><description><![CDATA[
<p>I might be wrong but isn't the HUD the author suggesting for coding is basically AREPL? For debugging I can see it work, but chatbox and inline q&a I feel has awider application.<p>On a wider note, I buy the argument for alternative interfaces other than chat, but chat permeates our lives every day,  smartphone is full of chat interfaces. HUD might be good for AR glasses though, literal HUD.</p>
]]></description><pubDate>Mon, 28 Jul 2025 06:04:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=44707668</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=44707668</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44707668</guid></item><item><title><![CDATA[New comment by yurimo in "The year of peak might and magic"]]></title><description><![CDATA[
<p>I believe there is new one coming out closer to HOMM3, called Olden Era. 
<a href="https://unfrozen.studio/games/olden-era/" rel="nofollow">https://unfrozen.studio/games/olden-era/</a><p>I did try Songs of Conquest, it was decent</p>
]]></description><pubDate>Sat, 19 Jul 2025 04:05:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=44612500</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=44612500</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44612500</guid></item><item><title><![CDATA[Text-to-LoRA: Instant Transformer Adaption]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.arxiv.org/pdf/2506.06105">https://www.arxiv.org/pdf/2506.06105</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44254700">https://news.ycombinator.com/item?id=44254700</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 12 Jun 2025 06:09:54 +0000</pubDate><link>https://www.arxiv.org/pdf/2506.06105</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=44254700</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44254700</guid></item><item><title><![CDATA[New comment by yurimo in "Ask HN: What are you working on? (April 2025)"]]></title><description><![CDATA[
<p>Trying to make interpretability research practical. A bit early for the demo, but I am getting some interesting results for large multimodal models in terms of their reasoning.</p>
]]></description><pubDate>Mon, 28 Apr 2025 07:17:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=43818516</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=43818516</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43818516</guid></item><item><title><![CDATA[New comment by yurimo in "Helix: A vision-language-action model for generalist humanoid control"]]></title><description><![CDATA[
<p>I don't know, there has been so many overhyped and faked demos in humanoid robotics space over the last couple years, it is difficult to believe what is clearly a demo release for shareholders. Would love to see some demonstration in a less controlled environment.</p>
]]></description><pubDate>Thu, 20 Feb 2025 16:53:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=43117060</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=43117060</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43117060</guid></item><item><title><![CDATA[New comment by yurimo in "Magma: A foundation model for multimodal AI agents"]]></title><description><![CDATA[
<p>Multimodal agents notoriously fail at long horizon tasks, how does Magma perform on it?</p>
]]></description><pubDate>Thu, 20 Feb 2025 07:00:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=43111900</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=43111900</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43111900</guid></item><item><title><![CDATA[New comment by yurimo in "My Struggle with Doom Scrolling"]]></title><description><![CDATA[
<p>iPhone has this nice accessibility feature where you can greyscale the screen, this along with putting the phone away in a distance that I would have to get up and walk to it made a huge difference in frequency of usage.</p>
]]></description><pubDate>Wed, 22 Jan 2025 19:15:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=42796409</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=42796409</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42796409</guid></item><item><title><![CDATA[New comment by yurimo in "Calibrating Recommendations to Better Match User Interests"]]></title><description><![CDATA[
<p>How do you define and quantify ‘calibration’ in this context? Is it purely based on aligning recommendations with explicit user preferences, or are you also trying to infer latent interests?</p>
]]></description><pubDate>Thu, 19 Dec 2024 16:19:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=42462935</link><dc:creator>yurimo</dc:creator><comments>https://news.ycombinator.com/item?id=42462935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42462935</guid></item></channel></rss>