Hacker News: wbhart

New comment by wbhart in "Gemini with Deep Think achieves gold-medal standard at the IMO"

wbhart — Tue, 22 Jul 2025 05:15:05 +0000

I have not been working on formalization but theorem proving, so I can't confidently answer some of those questions.

However, I recognise that there is not so much training data for LLMs wanting to use the Lean language. Moreover, you are really teaching it how to apply "Lean tactics", which may or may not be related to what mathematicians do in standard texts on which LLMs have trained. Finally, the foundations are different: dependent type theory, instead of the set theory that mathematicians use.

My own personal perspective is that esoteric formal languages serve a purpose, but not this one. Most mathematicians have not been hot on the idea (with a handful of famous exceptions). But the idea seems to have gained a lot of traction anyway.

I'd personally like to see more money put into informal symbolic theorem proving tools which can very rapidly find a solution as close to natural language and the usual foundations as possible. But funding seems to be a huge issue. Academia has been bled dry, and no one has an appetite for huge multi-year projects of that kind.

New comment by wbhart in "Gemini with Deep Think achieves gold-medal standard at the IMO"

wbhart — Mon, 21 Jul 2025 21:05:27 +0000

Lean is an interactive prover, not an automated prover. Last year a lot of human effort was required to formalise the problems in Lean before the machines could get to work. This year you get natural language input and output, and much faster.

The advantage of Lean is that the system checks the solutions, so hallucination is impossible. Of course, one still relies on the problems and solutions being translated to natural language correctly.

Some people prefer difficult to read formally checked solutions over informal but readable solutions. The two approaches are just solving different problems.

But there is another important reason to want to do this reliably in natural language: you can't use Lean for other domains (with a few limited exceptions). They want to train their RL pipelines for general intelligence and make them reliable for long horizon problems. If a tool is needed as a crutch, then it more or less demonstrates that LLMs will not be enough in any domain, and we'll have to wait for traditional AI to catch up for every domain.

New comment by wbhart in "AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms"

wbhart — Wed, 14 May 2025 23:02:20 +0000

1. Correct

2. Correct, however you can use Waksman as a basecase and always beat Strassen (though it is not asymptotically better of course).

5. Possible, but even so, there is already an algorithm that will work with 46 real multiplications (and some divisions by 2). The real numbers are commutative and admit division by 2.

New comment by wbhart in "AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms"

wbhart — Wed, 14 May 2025 22:47:17 +0000

Z_2 has characteristic 2, not 0.

New comment by wbhart in "AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms"

wbhart — Wed, 14 May 2025 21:44:53 +0000

As already noted in a post by fdej further down, Waksman's algorithm from 1970, which works over the complex numbers, requires only 46 multiplications (and I guess, divisions by 2, which may or may not be relevant depending on your actual ring).

New comment by wbhart in "AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms"

wbhart — Wed, 14 May 2025 21:24:26 +0000

There's even an Open Source implementation of Waksman's in Flint, the package fdej maintains.

New comment by wbhart in "A cycle-accurate IBM PC emulator in your web browser"

wbhart — Thu, 08 May 2025 00:25:23 +0000

I've been using MartyPC for a few years and except for emulating glitches in hardware which depend on the manufacturer, date of manufacture or even temperature, it is getting harder to find cycle accurate tricks that MartyPC can't emulate perfectly (believe me, we've been trying).

The whole thing is a marvel of software engineering!

What is remarkable is that the author (GloriousCow) doesn't complain that people are ripping off his code and ideas, but that more people haven't used his learnings to create other cycle accurate emulators for the PC.

New comment by wbhart in "What we learned copying all the best code assistants"

wbhart — Sat, 04 Jan 2025 14:43:50 +0000

This blog article is written in a very engaging way. It seems to be more or less a masterclass on how to keep someone's attention, although there is no meta-story making you wait for the big fulfillment at the end.

I think it is the short, punchy sections with plenty of visuals and the fact that you are telling a story the whole way through, which has a natural flow, each experiment you describe, leading to the next.

New comment by wbhart in "The Theory of Topos-Theoretic 'Bridges' – A Conceptual Introduction (2016)"

wbhart — Sat, 07 Sep 2024 11:46:45 +0000

This is an interesting article, but the absence of simple examples of theories and their toposes and invariants made it seem a little abstract. Surely if the technique is so powerful, there ought to be easy examples of it aplenty.

New comment by wbhart in "'A-team' of math proves a critical link between addition and sets"

wbhart — Sat, 09 Dec 2023 13:14:53 +0000

I doubt that he was intending a burn here. He's not talking about user interface issues or the like.

He's most probably, I would assume, talking about the well-known issue that formalisation tools require knowledge that a standard mathematician doesn't possess, such as the names of all the Lean tactics, how to locate the right theorem to use in mathlib (the Lean library), the necessity to write mathematics in a very formal language which is based on dependent type theory rather than based on sets (which is what most mathematicians are used to), etc.

Moreover, formalised mathematics does not work with natural language (and perhaps can't), and it will not accept informal or intuitive arguments. There are a lot of very fiddly things that have to be attended to. One can't assume that the reader is a skilled mathematicians who can easily fill in trivial details, as when writing a maths paper for expert readers.

But the Lean people are acutely aware of all of this. In my opinion he's not so much offering them much needed criticism so much as acknowledging one of the known barriers to mathematicians from outside the formalisation community getting into formalisation.

This is a major point of the article, that real, serious mathematicians have broken through that barrier recently, and it's not the first time. So things have improved enough to show that it is possible for seriously committed mathematicians to do this, even famous ones!

New comment by wbhart in "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster"

wbhart — Thu, 23 Nov 2023 07:22:33 +0000

The field is fairly new to me. I'm originally from computer algebra, and somehow struggling into the field of ATP.

The most interesting papers to me personally are the following three:

* Making higher order superposition work. https://doi.org/10.1007/978-3-030-79876-5_24

* MizAR 60 for Mizar 50. https://doi.org/10.48550/arXiv.2303.06686

* Magnus Hammer, a Transformer Based Approach to Premise Selection. https://doi.org/10.48550/arXiv.2303.04488

Your mileage may vary.

New comment by wbhart in "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster"

wbhart — Thu, 23 Nov 2023 06:13:34 +0000

Yes, there are various approaches like tree-of-thought. They don't fundamentally solve the problem because there are just too many paths to explore and inference is just too slow and too expensive to explore 10,000 or 100,000 paths just for basic problems that no one wanted to solve anyway.

The problem with solving such problems with LLMs is that if the solution to the problem is unlike problems seen in training, the LLM will almost every time take the wrong path and very likely won't even think of the right path at all.

The AI really does need to understand why the paths it tried failed in order to get insight into what might work. That's how humans work (well, one of many techniques we use). And despite what people think, LLMs really don't understand what they are doing. That's relatively easy to demonstrate if you get an LLM off distribution. They will double down on obviously erroneous illogic, rather than learn from the entirely new situation.

New comment by wbhart in "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster"

wbhart — Thu, 23 Nov 2023 06:02:20 +0000

Sure, but people have already applied deep learning techniques to theorem proving. There are some impressive results (which the press doesn't seem at all interested in because it doesn't have ChatGPT in the title).

It's really harder than one might imagine to develop a system which is good at higher order logic, premise selection, backtracking, algebraic manipulation, arithmetic, conjecturing, pattern recognition, visual modeling, has a good mathematical knowledge, is autonomous and fast enough to be useful.

For my money, it isn't just a matter of fitting a few existing jigsaw pieces together in some new combination. Some of the pieces don't exist yet.

New comment by wbhart in "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster"

wbhart — Thu, 23 Nov 2023 05:48:55 +0000

People have done experiments trying to get GPT-4 to come up with viable conjectures. So far it does such a woefully bad job that it isn't worth even trying.

Unfortunately there are rather a lot of issues which are difficult to describe concisely, so here is probably not the best place.

Primary amongst them is the fact that an LLM would be a horribly inefficient way to do this. There are much, much better ways, which have been tried, with limited success.

New comment by wbhart in "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster"

wbhart — Thu, 23 Nov 2023 05:42:14 +0000

How on earth could you evaluate the scaling path with too little information. That's my point. You can't possibly know that a technology can solve a given kind of problem if it can only so far solve a completely different kind of problem which is largely unrelated!

Saying that performance on grade-school problems is predictive of performance on complex reasoning tasks (including theorem proving) is like saying that a new kind of mechanical engine that has 90% efficiency can be scaled 10x.

These kind of scaling claims drive investment, I get it. But to someone who understands (and is actually working on) the actual problem that needs solving, this kind of claim is perfectly transparent!

New comment by wbhart in "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster"

wbhart — Thu, 23 Nov 2023 05:25:06 +0000

I think maybe I didn't make myself quite clear here. There are already algorithms which can solve advanced mathematical problems 100% reliably (prove theorems). There are even algorithms which can prove any correct theorem that can be stated in a certain logical language, given enough time. There are even systems in which these algorithms have actually been implemented.

My point is that no technology which can solve grade school maths problems would be viewed as a breakthrough by anyone who understood the problem. The fundamental problems which need to be solved are not problems you encounter in grade school mathematics. The article is just ill-informed.

New comment by wbhart in "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster"

wbhart — Thu, 23 Nov 2023 05:17:45 +0000

There are certainly efforts along the lines of what you suggest. There are problems though. The number of backtracks is 10^k where k is not 2, or 3, or 4.....

Another issue is that of autoformalisation. This is the one part of the problem where an LLM might be able to help, if it were reliable enough (it isn't currently) or if it could truly understand the logical structure of mathematical problems correctly (currently they can't).

New comment by wbhart in "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster"

wbhart — Thu, 23 Nov 2023 05:12:37 +0000

I've tested GPT-4 on this and it can be induced to give up on certain lines of argument after recognising they aren't leading anywhere and to try something else. But it would require thousands (I'm really under exaggerating here) of restarts to get through even fairly simple problems that professional mathematicians solve routinely.

Currently the context length isn't even long enough for it to remember what problem it was solving. And I've tried to come up with a bunch of ways around this. They all fail for one reason or another. LLMs are really a long, long way off managing this efficiently in my opinion.

New comment by wbhart in "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster"

wbhart — Thu, 23 Nov 2023 04:49:05 +0000

I feel very comfortable saying, as a mathematician, that the ability to solve grade school maths problems would not be at all a predictor of ability to solve real mathematical problems at a research level.

The reason LLMs fail at solving mathematical problems is because: 1) they are terrible at arithmetic, 2) they are terrible at algebra, but most importantly, 3) they are terrible at complex reasoning (more specifically they mix up quantifiers and don't really understand the complex logical structure of many arguments) 4) they (current LLMs) cannot backtrack when they find that what they already wrote turned out not to lead to a solution, and it is too expensive to give them the thousands of restarts they'd require to randomly guess their way through the problem if you did give them that facility

Solving grade-school problems might mean progress in 1 and 2, but that is not at all impressive, as there are perfectly good tools out there that solve those problems just fine, and old-style AI researchers have built perfectly good tools for 3. The hard problem to solve is problem 4, and this is something you teach people how to do at a university level.

(I should add that another important problem is what is known as premise selection. I didn't list that because LLMs have actually been shown to manage this ok in about 70% of theorems, which basically matches records set by other machine learning techniques.)

(Real mathematical research also involves what is known as lemma conjecturing. I have never once observed an LLM do it, and I suspect they cannot do so. Basically the parameter set of the LLM dedicated to mathematical reasoning is either large enough to model the entire solution from end to end, or the LLM is likely to completely fail to solve the problem.)

I personally think this entire article is likely complete bunk.

Edit: after reading replies I realise I should have pointed out that humans do not simply backtrack. They learn from failed attempts in ways that LLMs do not seem to. The material they are trained on surely contributes to this problem.

New comment by wbhart in "GPT-4 is not getting worse"

wbhart — Sat, 16 Sep 2023 15:26:24 +0000

The tendency to begin summarising is very annoying. I'd assumed it was because of limited attention span of human raters who rated summarised or shorter outputs more highly. And I'd assumed this had been there from the beginning.

I encountered it when doing some research into getting GPT-4 to reliably multiply n-digit numbers. Up to 8x8 multiplications it doesn't do this very much, but by 10x10 it is almost impossible to get it to stop doing it.

When the multiplications become even larger, it seems to be literally impossible to prevent.