Hacker News: yurimo

New comment by yurimo in "Terence Tao: Mathematics in the Age of AI [pdf]"

yurimo — Sun, 26 Jul 2026 19:29:12 +0000

I think it is important to divine what currently AI is good for and what it is not even in such verifiable environments like math. Current hyped announcements about breaking conjectures are notable and are a marker of how much improvement was made. But as I read them, and maybe I am wrong, I see it as a large model+ harness executing a broad brute force search and trying solutions until something sticks. There are lots of problems like that and they should be solved, as often they are perhaps less important or overlooked, or just a slog, any field of research has these, math even more so.

However, this is very different from inventing new mathematical machinery that allows to break old problems, I think it will be a while until AI will be able to do it if at all. For now I think we will be moving to a symbiosis where an AI cracking a problem and giving a solution, inspires a human to invent new techniques.

New comment by yurimo in "Writing by hand is good for your brain"

yurimo — Fri, 24 Jul 2026 05:57:20 +0000

Or use sticky notes! My friend annotated her favorite book with notes for me and it was awesome.

New comment by yurimo in "Writing by hand is good for your brain"

yurimo — Fri, 24 Jul 2026 05:55:43 +0000

Personally found newest paperlike to be a lot like glass still, you barely get that paper feel at all and I hated it. They tried to keep clarity of the screen near glass level, but that affects the feeling a lot. Eventually I did some research and found that Japanese screen protectors that give you a feel of Kent paper are excellent for writing, so I wholeheartedly recommend those. You can find best rated ones by looking at top rated ones on jp websites, I have bellemond one.

New comment by yurimo in "Apple raises prices of MacBooks, iPads"

yurimo — Fri, 26 Jun 2026 05:04:59 +0000

I was considering getting M5 Max mbp with 128GB, but at that price it is just ridiculous, 64 version costs now roughly what 128 was before and that was a stretch for me. At this point might as well stick to my ol reliable M1 Pro mbp.

Edit: I'll say that now it seems also very hard to justify buying top of the line apple hardware for the enterprise. Getting top laptops for just a team of 10 people now means extra $20k just for RAM on top of already a higher base price.

New comment by yurimo in "An Introduction to YOLO26"

yurimo — Tue, 23 Jun 2026 05:18:06 +0000

Wow I'm old, I still remember working with YOLOv2.

New comment by yurimo in "macOS Container Machines"

yurimo — Wed, 10 Jun 2026 11:03:43 +0000

I'm pretty sure this is not the use case at all but man do I miss bootcamp. Even for games if we could just run linux without a need for crossover, gaming on mac machines would be a dream.

New comment by yurimo in "Harness engineering: Leveraging Codex in an agent-first world"

yurimo — Sun, 07 Jun 2026 07:02:53 +0000

I personally don't view coding agents making software as "software gotten better" you are comparing a tool and the end result, these are two different things. Agent you use going down and your product going down mean two different things to you customers. I will not deny that we made incredible progress in coding and hell, even design over the past 3.5 years, this technology is here to stay.

That being said while I agree that measuring better quality of software is vague (part of the reason it is hard for models as well), there are universal things I believe every engineer will agree on. Reliability, uptime, customer feedback, legibility of your engineering, performance, these are things we often optimized for. Google Maps is a bit of a strawman because neither of us (unless you work on it), knows how much agent code there is, I think it is likely that it's little since it was working fine prior to 2023. I could bring up github reliability as an example, given how much copilot usage they promote at MS, but once again only folks there know for certain. I do, however, see scores of various AI powered SAAS that looks like it is in a perpetual MVP state. I think you are right in that even if agents give us "good enough" results and we can swallow failure rates and our increasingly lesser understanding of what we, or more so model, created, then it is still progress overall, but this is progress not to human-AI collaboration but to AI-only engineering IMO, this is good or bad depending on how you view the future.

I'm a scientist and most of code I currently write is somewhere on the intersection of critical software and machine learning, squaring these two is not easy and I guess the way I was taught to reason about engineering informs my opinions on this. Maybe it's just a matter of time before codex can help here in an unconstrained manner as well, but I am skeptical at the moment.

New comment by yurimo in "Harness engineering: Leveraging Codex in an agent-first world"

yurimo — Sun, 07 Jun 2026 05:51:40 +0000

You can't really do that here because one of the key arguments for this, as people in the thread focus on is "1/10th of time" estimate, the comparison with humans is here already, albeit it is just an estimate and no actual comparison has been done.

This is a problem of conflicting incentives that exists today in my opinion. Companies will market greater human-AI collaboration in science and engineering but focus on releasing things like this where it is clear that downstream goal is complete agent ownership over the product, from inception to testing to monitoring. Maybe the speculative future agents will use their own very efficient language to code that won't be readable for people at all. They focus on agent code being readable by agent in the article, as you've said. But in my mind in at least near future, there is a case where your prod will break, you won't be able to understand it or the attempted fixes. Maybe agent will fail to fix it at all and start a massive rewrite. In any case is this different from kicking technical debt down the road along with worse interpretability of what you have built?

I do think there is a way where agent can write great solid code that we can read, but with the way LLMs are built this requires something new in terms of reward that accounts for "taste" and constant refinement so it might take more than 1/10th of a time to produce something good.

New comment by yurimo in "Harness engineering: Leveraging Codex in an agent-first world"

yurimo — Sun, 07 Jun 2026 05:32:59 +0000

I think the telling part is in this line:

> Because the repository is entirely agent-generated, it’s optimized first for Codex’s legibility

I asked a question from a perspective of a human engineer, as in, I will have to read the code and understand, fix it once it breaks. OpenAI approach is opposite, even if it is breaking it is the agent that will be doing the fixing, millions of lines and inelegant designs don't matter because human readability doesn't matter. In any case you use more tokens so you fork over more money.

I will say, however, that IMHO there is objectively bad and good code in terms what it can do and performance, if I can do the same thing in 50 lines as opposed to 1000 lines, this difference still matters for the model. Smaller context usage, better approach that informs downstream generation.

New comment by yurimo in "Harness engineering: Leveraging Codex in an agent-first world"

yurimo — Sun, 07 Jun 2026 02:25:39 +0000

What I still can't understand is why is massive amount of code generated is a flex? I don't feel that software has gotten a lot better in past 3 years, only sloppier. It's surprising to me that people who know about reward hacking choose a simple objective like lines of code generated as a signal for quality. I'd argue you have to optimize for less lines generated as possible while secondary optimization should be readability for humans. I suspect it's not seen as a problem by providers because more lines generated means more tokens used and hence more billing put out on customers.

And if I am working on an existing codebase then isn't a good commit often a negative sum between added and removed lines? I don't want to bloat my codebase but make it more polished and elegant. After reading that I wonder if what they have done could have been accomplished for a far fewer LoC budget.

New comment by yurimo in "PyTorch Landscape"

yurimo — Tue, 19 May 2026 07:29:21 +0000

Great stuff, thanks for posting.

New comment by yurimo in "Slop is not necessarily the future"

yurimo — Wed, 01 Apr 2026 04:58:51 +0000

> No one has ever made a purchasing decision based on how good your code is.

When I make a purchasing decision I expect the payment to go trough quickly and correctly and for whatever I purchase to arrive to me in reasonable time. All of this rests on the reputation of software being solid. If a user hears a whiff of purchase not being executed correctly, money or goods going somewhere else, this is the death sentence for your company.

Industry is now pushing for agentic web where agents can do this on your behalf. But if we have slop foundations and then add unstable models that can hallucinate and make mistakes on top of it, then it's just a recipe for catastrophe. I think relegating 2) into category of only mission critical software escapes the reality of how much reliability goes into everyday services people use.

New comment by yurimo in "HyperAgents: Self-referential self-improving agents"

yurimo — Fri, 27 Mar 2026 06:02:29 +0000

Sigh, as someone who does research in this area, this paper and its promotion on X has so many hype terms it is almost off-putting. If you read the paper what they are doing is trying to modify the scaffolding around a frozen FM until they get something better. None of this obviously includes any training (change to weights) or the underlying architecture. Even for scaffolding, a lot is still human-scaffolded: the outer loop (parent selection, evaluation protocol, task distribution) is mostly fixed. They experimented with editing parent selection and it rediscovers heuristics like UCB/softmax, but doesn’t yet beat handcrafted versions, so a lot of metrics are incremental, which is okay, that is what research is often. But it's not like a run away self-improvement or "improve forever" that people spin online.

It is an extension of their DGM paper. Also it's ~88M+ tokens per full run I think, not surprising as any sort of exploratory search is expensive and I commend them for releasing the code online because it pushes this small subfield. But people need to temper their expectations. IMO the best part is a nice transfer between improvement objectives after exhaustive iteration that they found. I am wondering if what we have here is a way to exhaust local search space, by letting the model better express it.

On a separate one thing I think a lot about is whether these unchecked hyped claims and terms and marketing of papers actually does more bad than good to the field by setting expectations that cannot be delivered and distracting from the actual hard and unsexy nature of problems that need to be solved.

New comment by yurimo in "Google's year in review: areas with research breakthroughs in 2025"

yurimo — Wed, 24 Dec 2025 12:47:12 +0000

Quanta? They do recaps by field every year. Have been a big fan for a while.

Neo the Home Robot [video]

yurimo — Tue, 28 Oct 2025 19:49:17 +0000

Article URL: https://www.youtube.com/watch?v=LTYMWadOW7c

Comments URL: https://news.ycombinator.com/item?id=45738102

Points: 7

# Comments: 1

New comment by yurimo in "Coral NPU: A full-stack platform for Edge AI"

yurimo — Sun, 19 Oct 2025 03:38:40 +0000

I'm guessing you can still find Coral TPU-based boards somewhere but not sure what the support for these will be now that the focus is shifting. Coral TPU also uses subset of tensorflow and its nice to see that the open standard is targeting jax and torch.

When I went to see if anyone is selling the boards or their "Partners" page regarding manufacturing design I got 404 even after signing in: https://developers.google.com/coral/guides/coral/resource

New comment by yurimo in "iRobot Founder: Don't Believe the AI and Robotics Hype"

yurimo — Tue, 30 Sep 2025 02:05:52 +0000

Rodney Brooks is widely recognized and celebrated roboticist, ran CSAIL for a while. iRobot as a company created a new market and managed to put a functional household robot out there, whether Chinese alternatives ate the share of it is largely irrelevant to his argument on humanoids, which I find to be completely reasonable.

Stationery – Japanology Plus [video]

yurimo — Mon, 22 Sep 2025 06:13:25 +0000

Article URL: https://www.youtube.com/watch?v=_I-5EggKDyo

Comments URL: https://news.ycombinator.com/item?id=45329651

Points: 1

# Comments: 0

New comment by yurimo in "Enough AI copilots, we need AI HUDs"

yurimo — Mon, 28 Jul 2025 06:04:25 +0000

I might be wrong but isn't the HUD the author suggesting for coding is basically AREPL? For debugging I can see it work, but chatbox and inline q&a I feel has awider application.

On a wider note, I buy the argument for alternative interfaces other than chat, but chat permeates our lives every day, smartphone is full of chat interfaces. HUD might be good for AR glasses though, literal HUD.

New comment by yurimo in "The year of peak might and magic"

yurimo — Sat, 19 Jul 2025 04:05:25 +0000

I believe there is new one coming out closer to HOMM3, called Olden Era. https://unfrozen.studio/games/olden-era/

I did try Songs of Conquest, it was decent