Hacker News: thethirdone

New comment by thethirdone in "Is anybody else bored of talking about AI?"

thethirdone — Tue, 24 Mar 2026 23:17:53 +0000

The grid is not all energy use. To get the numbers on an even playing field you need to compensate for that only ~40% of energy goes through the grid.

And that leaves a 6:1 ratio assuming projections run true. It very well might be possible to get efficiency wins from the transportation sector that outweigh growth in AI.

New comment by thethirdone in "Is anybody else bored of talking about AI?"

thethirdone — Tue, 24 Mar 2026 23:13:30 +0000

> Also, I'm not sure about your math. 4% would be 4% of the whole like in a pie chart, not 4% of the remainder after removing one slice. 4% AI, 30% transportation, 66% other. I don't know where that 40% is from.

40% is for energy use in the US in the form of electricity. It was a rough number that I pulled from my memory. It is roughly right though. Check https://www.eia.gov/energyexplained/us-energy-facts/

AI is not currently 4% of the energy market of the US. Only the grid. I should have been more clear about the ALL ENERGY vs GRID distinction.

> I think it might be more emissions-efficient at generating value than AI by a factor exceeding the 7.5x energy use. Moving rocks from (place with rocks) to (place that needs rocks) continues to be just an insanely good thing for humanity.

I really made no statement on the value of doing things. Transportation is obviously very valuable. I just wanted a more fact based conversation.

New comment by thethirdone in "Is anybody else bored of talking about AI?"

thethirdone — Tue, 24 Mar 2026 21:36:19 +0000

Compare that to ~30% of all energy use for transportation. So approximately 40%*4% = 1.6% vs 30%. I find your correction to be more wrong that the initial statement.

> And most of that new capacity will be natural gas. That increase would basically whipe out the reduction in CO2 emissions the USA has had since 2018.

Emissions in 2018 were ~5250M metric ton and in 2024 it was 4750M. That is a reduction of 10% total emissions. Without going into calculations of green electricity and such, its still safe to say AI using 10% of the grid would not completely wipe out the reduction.

[0]: https://www.statista.com/statistics/183943/us-carbon-dioxide...

New comment by thethirdone in "Learning athletic humanoid tennis skills from imperfect human motion data"

thethirdone — Sun, 15 Mar 2026 22:26:46 +0000

I agree that the movements look quite robotic (though not as much as you might expect), but I don't think any movies have depicted robots moving like that. A much more common depiction is moving only a single joint at a time.

> Sharp, unsure movements, a lot of hesitation, ...

I like these particular descriptors. Another I would add is holding poses unnaturally still. While waiting for the ball, the robot holds its racket extremely consistently relative to its body even while sharply turning.

New comment by thethirdone in "Measuring AI agent autonomy in practice"

thethirdone — Fri, 20 Feb 2026 03:31:08 +0000

You would not. You don't normally post lots of comments. The occasional return after a long period of inactivity is not in itself suspicious.

New comment by thethirdone in "Giving up upstream-ing my patches and feel free to pick them up"

thethirdone — Sat, 31 Jan 2026 16:07:05 +0000

The d suffix makes it not compile under clang. The PRs seem like mostly small changes that are clear improvements.

New comment by thethirdone in "Deep dive into Turso, the “SQLite rewrite in Rust”"

thethirdone — Fri, 30 Jan 2026 04:58:54 +0000

The criteria were laid out in 2019 [0]. It was less clear then.

> If you are a "rustacean" and feel that Rust already meets the preconditions listed above, and that SQLite should be recoded in Rust, then you are welcomed and encouraged to contact the SQLite developers privately and argue your case.

It seems like the criteria are less of things the SQLite developers are claiming Rust can't do and more that they are non-negotiable properties that need to be considered before even bringing the idea of a rust version to the team.

I think it is at least arguable that Rust does not meet the requirements. And they did explicitly invite private argument if you feel differently.

0: https://web.archive.org/web/20190423143433/https://sqlite.or...

Can AI Pass Freshman CS? [video]

thethirdone — Wed, 21 Jan 2026 03:56:06 +0000

Article URL: https://www.youtube.com/watch?v=56HJQm5nb0U

Comments URL: https://news.ycombinator.com/item?id=46700902

Points: 3

# Comments: 1

New comment by thethirdone in "Provably unmasking malicious behavior through execution traces"

thethirdone — Wed, 21 Jan 2026 00:30:06 +0000

Based on Table 1: This method is actually worse than generating a random number (0-100% independent of the program) and testing if it is less than 98.8%. That would achieve a better detection rate without increasing the false positive rate.

It doesn't seem worth it to try to follow the math to see if there is something interesting.

New comment by thethirdone in "Let's be honest, Generative AI isn't going all that well"

thethirdone — Tue, 13 Jan 2026 23:52:16 +0000

I have seen many people try to use Claude Code and get LOTS of bugs. Show me any > 10k project you have made with it and I will put the effort in to find one bug free of charge.

New comment by thethirdone in "Let's be honest, Generative AI isn't going all that well"

thethirdone — Tue, 13 Jan 2026 23:35:23 +0000

Which ones of those have been achieved in your opinion?

I think the arbitrary proofs from mathematical literature is probably the most solved one. Research into IMO problems, and Lean formalization work have been pretty successful.

Then, probably reading a novel and answering questions is the next most successful.

Reliably constructing 10k bug free lines is probably the least successful. AI tends to produce more bugs than human programmers and I have yet to meet a programmer who can reliably produce less than 1 bug per 10k lines.

Radiance Meshes for Volumetric Reconstruction

thethirdone — Fri, 12 Dec 2025 21:25:51 +0000

Article URL: https://half-potato.gitlab.io/rm/

Comments URL: https://news.ycombinator.com/item?id=46249164

Points: 11

# Comments: 0

New comment by thethirdone in "The "confident idiot" problem: Why AI needs hard rules, not vibe checks"

thethirdone — Mon, 08 Dec 2025 23:46:41 +0000

You did not actually address the core of my points at all.

> It isn't a case of ratio it is a fundamentally different method of working hence my point of not needing all human literary output do the the equivalent of an LLM.

You can make ratios of anything. I agree that human cognition is different than LLM cognition, though I would think of it more like a phase difference than fundamentally different phenomena. Think liquid water vs steam, the density (a ratio) is vastly different and they have different harder to describe properties (surface tension, filling volume, incompressible vs compressible).

> Humans provide the connections, the reasoning the thought the insights and the subsequent correlations THEN we humans try to make a good pattern matcher/ guesser (the LLM) to match those.

Yes, humans provide the training data and benchmarks for measuring LLM improvement. Somehow meaning about the world has to get trained on to have any understanding. However, humans talking about patterns in number is not how the LLMs learned this. It is very much from just seeing lots of examples and deducing (during training not inference) the pattern. The fact that a general pattern is embedded in the weights implies that some general understand of many things are baked into the model.

> This common retort: most humans also makes mistakes, or most humans also do x, y, z means nothing.

It is not a retort, but some argument towards what "understanding" means. From what you have said, my guess of your definition makes "understanding" what humans do and computers are incapable of (by definition). If LLMs could out compete humans in all professional tasks, I think it would be hard to say they understand nothing. Humans are a worthwhile point of comparison and human exceptionalism can only really hold up until being surpassed.

I would also point out that some humans DO understand the properties of numbers I was referring to. In fact, I figured it out in second grade while doing lots of extra multiplication problems as punishment for being a brat.

> My digital thermometer uses an algorithm to determine the temperature. ... The paper will not be thinking if that is done.

I did not say "All algorithms are thinking". The stronger version of what I was saying is "Some algorithms can think." You simply have asserted the opposite with no reasoning.

> In fact at the extreme end this anthropomorphising has led to exacerbating mental health conditions and unfortunately has even led to humans killing themselves.

I do concede that anthropomorphizing can be problematic, especially if you do not have a background in CS and ML to understand beneath the hood. However, you completely skipped past my rather specific explanation of how it can be useful. On HN in particular, I do expect people to bring enough technical understanding to the table to not just treat LLMs as people.

New comment by thethirdone in "The "confident idiot" problem: Why AI needs hard rules, not vibe checks"

thethirdone — Mon, 08 Dec 2025 20:38:16 +0000

> The simplest example being that LLM's somehow function in a similar fashion to human brains. They categorically do not. I do not have most all of human literary output in my head and yet I can coherently write this sentence.

The ratio of cognition to knowledge is much higher in humans that LLMs. That is for sure. It is improving in LLMs, particularly small distillations of large models.

A lot of where the discussion gets hung up on is just words. I just used "knowledge" to mean ability to recall and recite a wide range of fasts. And "cognition" to mean the ability to generalize, notice novel patterns and execute algorithms.

> They don't actually understand anything about what they output. It's just text.

In the case of number multiplication, a bunch of papers have shown that the correct algorithm for the first and last digits of the number are embedded into the model weights. I think that counts as "understanding"; most humans I have talked to do not have that understanding of numbers.

> It's just an algorithm.

> I am surprised so many in the HN community have so quickly taken to assuming as fact that LLM's think or reason. Even anthropomorphising LLM's to this end.

I don't think something being an algorithm means it can't reason, know or understand. I can come up with perfectly rigorous definitions of those words that wouldn't be objectionable to almost anyone from 2010, but would be passed by current LLMs.

I have found anthropomorphizing LLMs to be a reasonably practical way to leverage the human skill of empathy to predict LLM performance. Treating them solely as text predictors doesn't offer any similar prediction; it is simply too complex to fit into a human mind. Paying a lot of attention to benchmarks, papers, and personal experimentation can give you enough data to make predictions from data, but it is limited to current models, is a lot of work, and isn't much more accurate than anthropomorphization.

New comment by thethirdone in "Ilya Sutskever, Yann LeCun and the End of “Just Add GPUs”"

thethirdone — Thu, 27 Nov 2025 04:29:17 +0000

I would hope god can do better than 40% on a test. If you select experts from the relevant fields humans, they together would get a passing grade (70%) at least. A group of 20 humans is not godlike.

New comment by thethirdone in "Ilya Sutskever, Yann LeCun and the End of “Just Add GPUs”"

thethirdone — Thu, 27 Nov 2025 01:22:18 +0000

I disagree with the framing in 2.1 a lot.

  > Models look god-tier on paper:
  >  they pass exams
  >  solve benchmark coding tasks
  >  reach crazy scores on reasoning evals

Models don't look "god-tier" from benchmarks. Surely an 80% is not godlike. I would really like more human comparisons for these benchmarks to get a good idea of what an 80% means though.

I would not say that any model shows a "crazy" score on ARC-AGI.

I broadly have seen incremental improvements in benchmarks since 2020, mostly at a level I would believe to be below average human reasoning, but above average human knowledge. No one would call GPT-3 godlike and it is quite similar to modern models in benchmarks; it is not a difference of like 1% vs 90%. I think most people would consider gpt-3 to be closer to opus 4.5 than opus 4.5 is to a human.

New comment by thethirdone in "TiDAR: Think in Diffusion, Talk in Autoregression"

thethirdone — Sat, 22 Nov 2025 22:06:24 +0000

In this paper both the diffusion and the auto-regressive models are transformers with O(n^2) performance for long sequences. They share the "Exact KV Cache" for committed tokens.

Diffusion just allows you to spend more compute at the same time so you don't redundantly access the same memory. It can only improve speed beyond the memory bandwidth limit by committing multiple tokens each pass.

Other linear models like Mamba get away from O(n^2) effects, but type of neural architecture is orthogonal to the method of generation.

New comment by thethirdone in "Stop Explaining What Things Are"

thethirdone — Thu, 06 Nov 2025 01:08:29 +0000

Its not clear to me this is an actual problem. I just actually googled "how to fix a Git conflict" and not a single one has multiple paragraphs describing what things are.

The first result [0] pretty much immediately drops into what commands to run. If that result is part of the problem, I fully disagree it is a problem.

[0]: https://docs.github.com/en/pull-requests/collaborating-with-...

New comment by thethirdone in "Claude Sonnet 4.5"

thethirdone — Tue, 30 Sep 2025 03:10:13 +0000

Do you actually disagree with the "minutiae was always borderline irrelevant" part or that it comes along with "making somebody money"? I pretty strongly agree with the original quote including the "possibly with software" part.

Minutiae such as tabs vs spaces and other formatting changes are pretty clearly "borderline irrelevant" and code formatters have largely solved programmers arguing about them. Exactly how to best factor your code into functions and classes is also a commonly argued but "borderline irrelevant." Arguments about "clean code" are a good example of this.

Broadly, the skills I see that LLMs make useless to have honed are the the minutiae that were already "borderline irrelevant." Knowing how to make your code performant, knowing how to make good apis that can be stable long term, in general having good taste for architecture is still very useful. In fact it is more useful now.

New comment by thethirdone in "Irrelevant facts about cats added to math problems increase LLM errors by 300%"

thethirdone — Wed, 30 Jul 2025 01:33:13 +0000

There is a long history of people thinking humans are special and better than animals / technology. For animals, people actually thought animals can't feel pain and did not even consider the ways in which they might be cognitively ahead of humans. Technology often follows the path from "working, but worse than a manual alternative" to "significantly better than any previous alternative" despite naysayers saying that beating the manual alternative is literally impossible.

LLMs are different from humans, but they also reason and make mistakes in the most human way of any technology I am aware of. Asking yourself the question "how would a human respond to this prompt if they had to type it out without ever going back to edit it?" seems very effective to me. Sometimes thinking about LLMs (as a model / with a focus on how they are trained) explains behavior, but the anthropomorphism seems like it is more effective at actually predicting behavior.