Hacker News: cornel_io

New comment by cornel_io in "I used AI. It worked. I hated it"

cornel_io — Sun, 05 Apr 2026 06:59:45 +0000

Yes, it can. So can I. But neither of us will write the code exactly the way nitpicky PR reviewer #2 demands it be written unless he makes his preferences clear somewhere. Even at a nitpick-hellhole like Google that's mostly codified into a massive number of readability rules, which can be found and followed in theory. Elsewhere, most reviewer preferences are just individual quirks that you have to pick up on over time, and that's the kind of stuff that neither new employees nor Claude will ever possibly be able to get right in a one-shot manner.

New comment by cornel_io in "Claude Code Cheat Sheet"

cornel_io — Tue, 24 Mar 2026 05:05:31 +0000

Just tell Claude to create tmux sessions for each, it can figure out the rest.

New comment by cornel_io in "Statement on the comments from Secretary of War Pete Hegseth"

cornel_io — Sat, 28 Feb 2026 04:31:10 +0000

As an investor in Anthropic, I'd say that anyone who wasn't aware of where they stood on various values issues the whole time should not have been putting money in, it was not hidden.

New comment by cornel_io in "Total monthly number of StackOverflow questions over time"

cornel_io — Sun, 04 Jan 2026 01:56:12 +0000

Asking questions on SO was an exercise in frustration, not "interacting with peers". I've never once had a productive interaction there, everything I've ever asked was either closed for dumb reasons or not answered at all. The library of past answers was more useful, but fell off hard for more recent tech, I assume because people all were having the same frustrations as I was and just stopped going there to ask anything.

I have plenty of real peers I interact with, I do not need that noise when I just need a quick answer to a technical question. LLMs are fantastic for this use case.

New comment by cornel_io in "Resistance training load does not determine hypertrophy"

cornel_io — Thu, 01 Jan 2026 10:21:54 +0000

This article claims that's false, that 8-12 at higher weight leads to the same result as 20+ at lower weights.

New comment by cornel_io in "Measuring AI Ability to Complete Long Tasks"

cornel_io — Sun, 21 Dec 2025 06:11:31 +0000

And at the end of the day it's not really a tradeoff we'll need to make, anyways: my experience with e.g. Claude Code is that every model iteration gets much better at avoiding balls of mud, even without tons of manual guidance and pleading.

I get that even now it's very easy to let stuff get out of hand if you aren't paying close attention yourself to the actual code, so people assume that it's some fundamental limitation of all LLMs. But it's not, much like 6 fingered hands was just a temporary state, not anything deep or necessary that was enforced by the diffusion architecture.

New comment by cornel_io in "I failed to recreate the 1996 Space Jam website with Claude"

cornel_io — Mon, 08 Dec 2025 05:40:00 +0000

Theoretical "proofs" of limitations like this are always unhelpful because they're too broad, and apply just as well to humans as they do to LLMs. The result is true but it doesn't actually apply any limitation that matters.

New comment by cornel_io in "You should write an agent"

cornel_io — Fri, 07 Nov 2025 08:04:07 +0000

When a tool call completes the result is sent back to the LLM to decide what to do next, that's where it can decide to go do other stuff before returning a final answer. Sometimes people use structured outputs or tool calls to explicitly have the LLM decide when it's done, or allow it to send intermediate messages for logging to the user. But the simple loop there lets the LLM do plenty of it has good tools.

New comment by cornel_io in "Survey: a third of senior developers say over half their code is AI-generated"

cornel_io — Sun, 31 Aug 2025 23:54:20 +0000

There are thousands of projects out there that use mocks for various reasons, some good, some bad, some ugly. But it doesn't matter: most engineers on those projects do not have the option to go another direction, they have to push forward.

New comment by cornel_io in "Flunking my Anthropic interview again"

cornel_io — Fri, 29 Aug 2025 22:30:54 +0000

Of course. But I've screened far more out because I was in a rush and got 40 resumes in that day and they just didn't pique my interest as much as the next one over.

New comment by cornel_io in "US AI Action Plan"

cornel_io — Thu, 24 Jul 2025 02:24:46 +0000

There may be a ceiling, sure. It's overwhelmingly unlikely that it's just about where humans ended up, though.

New comment by cornel_io in "François Chollet: The Arc Prize and How We Get to AGI [video]"

cornel_io — Tue, 08 Jul 2025 17:06:33 +0000

I'm all for benchmarks that push the field forward, but ARC problems seem to be difficult for reasons having less to do with intelligence and more about having a text system that works reliably with rasterized pixel data presented line by line. Most people would score 0 on it if they were shown the data the way an LLM sees it, these problems only seem easy to us because there are visualizers slapped on top.

New comment by cornel_io in "I don't think AGI is right around the corner"

cornel_io — Mon, 07 Jul 2025 06:51:23 +0000

If ChatGPT claims arsenic to be a tasty snack, OpenAI adds a p0 eval and snuffs that behavior out of all future generations of ChatGPT. Viewed vaguely in faux genetic terms, the "tasty arsenic gene" has been quickly wiped out of the population, never to return.

Evolution is much less brutal and efficient. To you death matters a lot more than being trained to avoid a response does to ChatGPT, but from the point of view of the "tasty arsenic" behavior, it's the same.

New comment by cornel_io in "P-Hacking in Startups"

cornel_io — Sun, 22 Jun 2025 06:03:57 +0000

Even though this post says exactly the thing that most Proper Analysts will say, and write long LinkedIn posts about where other Proper Analysts congratulate them on standing up for Proper Analysis in the face of Evil And Stupid Business Dummies who just want to make bad decisions based on too little data, it's wrong. The Jedi Bell Curve meme is in full effect on this topic, and I say this as someone who took years to get over the midwit hump and correct my mistaken beliefs.

The business reality is, you aren't Google. You can't collect a hundred million data points for each experiment that you run so that you can reliably pick out 0.1% effects. Most experiments will have a much shorter window than any analyst wants them to, and will have far too few users, with no option to let them run longer. You still have to make a damned decision, now, and move on to the next feature (which will also be tested in a heavily underpowered manner).

Posts like this say that you should be really, REALLY careful about this, and apply Bonferonni corrections and make sure you're not "peeking" (or if you do peek, apply corrections that are even more conservative), preregister, etc. All the math is fine, sure. But if you take this very seriously and are in the situation that most startups are in where the data is extremely thin and you need to move extremely fast, the end result is that you should reject almost every experiment (and if you're leaning on tests, every feature). That's the "correct" decision, academically, because most features lie in the sub 5% impact range on almost any metric you care about, and with a small number of users you'll never have enough power to pick out effects that small (typically you'd want maybe 100k, depending on the metric you're looking at, and YOU probably have a fraction of that many users).

But obviously the right move is not to just never change the product because you can't prove that the changes are good - that's effectively applying a very strong prior in favor of the control group, and that's problematic. Nor should you just roll out whatever crap your product people throw at the wall: while there is a slight bias in most experiments in favor of the variant, it's very slight, so your feature designers are probably building harmful stuff about half the time. You should apply some filter to make sure they're helping the product and not just doing a random walk through design space.

The best simple strategy in a real world where most effect sizes are small and you never have the option to gather more data really is to do the dumb thing: run experiments for as long as you can, pick whichever variant seems like it's winning, rinse and repeat.

Yes, you're going to be picking the wrong variant way more often than your analysts would prefer, but that's way better than never changing the product or holding out for the very few hugely impactful changes that you are properly powered for. On average, over the long run, blindly picking the bigger number will stack small changes, and while a lot of those will turn out to be negative, your testing will bias somewhat in favor of positive ones and add up over time. And this strategy will provably beat one that does Proper Statistics and demands 95% confidence or whatever equivalent Bayesian criteria you use, because it leaves room to accept the small improvements that make up the vast majority of feature space.

There's an equivalent and perhaps simpler way to justify this, which is to throw out the group labels: if we didn't know which one was the control and we had to pick which option was better, then quite obviously, regardless of how much data we have, we just pick the one that shows better results in the sample we have. Including if there's just a single user in each group! In an early product, this is TOTALLY REASONABLE, because your current product sucks, and you have no reason to think that the way it is should not be messed with. Late lifecycle products probably have some Chesterton's fence stuff going on, so maybe there's more of an argument to privilege the control, but those types of products should have enough users to run properly powered tests.

New comment by cornel_io in "Dependency injection frameworks add confusion"

cornel_io — Sun, 25 May 2025 23:15:44 +0000

I think you're being downvoted because you're agreeing with the post you're quoting, but arguing as if they're wrong: the example in question was there to show how DI can be useful, so there's nothing to argue against.

New comment by cornel_io in "A flat pricing subscription for Claude Code"

cornel_io — Fri, 09 May 2025 11:16:00 +0000

I've often run multiple Claude Code sessions in parallel to do different tasks. Burns money like crazy, but if you can handle wrangling them all there's much less sitting and waiting for output.

New comment by cornel_io in "Google is illegally monopolizing online advertising tech, judge rules"

cornel_io — Thu, 17 Apr 2025 19:13:07 +0000

None of that seems at all user-hostile to me, it's literally all aimed at making sure what the user is shown is more likely to actually be useful to them.

I guess this is a big and probably unbridgeable divide, some people think this sort of thing is obviously evil and others, like me, actually prefer it very strongly over a world where all advertising is untargeted but there is massively more of it because it's so much less valuable...

New comment by cornel_io in "The Worst Programmer I Know (2023)"

cornel_io — Sun, 23 Mar 2025 15:56:48 +0000

There are a couple important things to also keep in mind:

First: just like there can be individuals who lift up an entire team but are not ticking off tasks themselves, there can be apparently individually productive team members who slow the entire team down for any of a number of reasons. In my experience it's usually either that they are a) fast but terrible, b) have really weird coding styles that might not be bad but are very difficult for others to work with (architecture astronauts and people who barely speak the language often fall here), or c) are process bullies who try to set up the entire review system to enforce their preferences in a hardline way that suits them but delays everyone but them. Each needs to be dealt with very differently, to varying degrees of success, but my honest opinion at this stage is that no matter how productive these people seem by themselves it's mostly harmful to have them on a team. Behavioral issues in senior people tend to be really tough to adjust, and take a lot of energy from a manager that is better spent helping your top performers excel; that said, if you can get them to adjust sometimes it's worth the effort.

Second: pair programming works great for some people, but it is terrible for others. I've measured this on teams by trial and error over fairly long periods, and unfortunately it's also the case that people don't segment neatly based on their preferences, so the obvious "let them choose" method isn't ideal. There are pairs of people who really love pair programming and desperately want to do it all the time who are almost fully 2x as productive when split apart instead (yes, including downstream bugs and follow-ons, meaning that they really are just having two people do the job of one), and there are people who hate pairing who see similar multiples by being forced into it even though they hate it. My rough intuition is that there are two independent variables here, a social factor that measures how much you enjoy pairing, and a style factor that determines how much benefit you derive from it, with the two not correlating much at all. There might even be a slight anticorrelation, because the more social people who love it have already naturally done as much knowledge sharing as is helpful, and the people who hate it are underinvested there and could use some more focus on the fact that they're part of a team.

New comment by cornel_io in "AI Blindspots – Blindspots in LLMs I've noticed while AI coding"

cornel_io — Thu, 20 Mar 2025 05:09:37 +0000

There are various results that suggest that LLMs do internally have everything they'd need to know that they're hallucinating/wrong:

https://arxiv.org/abs/2402.09733

https://arxiv.org/abs/2305.18248

https://www.ox.ac.uk/news/2024-06-20-major-research-hallucin...

So I don't think it's that they have no concept of correctness, they do, but it's not strong enough. We're probably just not training them in ways that optimize for that over other desirable qualities, at least aggressively enough.

It's also clear to anyone who has used many different models over the years that the amount of hallucination goes down as the models get better, even without any special attention being (apparently) paid to that problem. GPT 3.5 was REALLY bad about this stuff, but 4o and o1 are at least mediocre. So it may be that it's just one of the tougher things for a model to figure out, even if it's possible with massive capacity and compute. But I'd say it's very clear that we're not in the world Gary Marcus wishes we were in, where there's some hard and fundamental limitation that keeps a transformer network from having the capability to be more truthful as a it gets better; rather, like all aspects, we just aren't as far along as we'd prefer.

New comment by cornel_io in "Lemma for the Fundamental Theorem of Galois Theory"

cornel_io — Sun, 16 Mar 2025 02:07:54 +0000

It's a bit shocking to me: I still remember all the concepts quite clearly from when I studied Galois theory ~20 years ago, to the point where I can run through a lot of the proofs conceptually in my head, but the vocabulary is GONE. Like, completely a blank, I don't remember almost any of the abstract algebra terms that all of this is expressed in.

It reminds me of the truth of the advice that my category theory professor gave us, that the definitions are both the least important and the most important things, simultaneously. They're the least important in that they're just words that wrap up very simple concepts, and merely knowing the definitions doesn't actually mean you can work with the concepts. But they're the most important thing in that most of higher level math really boils down to picking out the exact right set of definitions to use, at which point proofs tend to pop out as trivial and obvious statements using those definitions. And at a more practical level, you won't be able to read any math if the definitions are not ingrained, so you might as well get a head start and just rote memorize them if you want to succeed.

But it's interesting that the language is far less sticky in memory than the underlying intuition. My guess is that because the intuition is so much harder to develop, it wires itself in much more deeply than the words themselves, which can be pretty easily learned in a few hours of flashcard work.