You might find them used to accelerate research math by helping them with lemmas and checking for errors, and formalizing proofs. That seems realistic in the next couple of years.
I'll bet the most technical open problems will be the ones to fall first. What AIs lack in creativity they make up for in ability to absorb a large quantity of technical concepts.
(I think you should be very skeptical of anyone who hypes AlphaProof without mentioning this - which is not to suggest that there's nothing there to hype)
Talking about things like solving the Reimman hypothesis in so many years assumes a little too much about the difficulty of problems that we can't even begin to conceive of a solution for. A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.
Three days per problem is, by many standards, a 'reasonable' amount of time. However there are still unanswered questions, notably that 'three days' is not really meaningful in and of itself. How parallelized was the computation; what was the hardware capacity? And how optimized is AlphaProof for IMO-type problems (problems which, among other things, all have short solutions using elementary tools)? These are standard kinds of critical questions to ask.
I imagine they are under pressure not to make this mistake again.
What does this have to do with a hypothetical automatic theorem prover?
If you want a computer to be able to tell that it found a correct proof once it enumerates it, you need a bit more than that (really just the existence of automated proof checkers is enough for that).
Hilbert’s program was a (failed) attempt to determine, loosely speaking, whether there was a process or procedure that could discover all mathematical truths. Any theorem depends on the formal system you start with, but the deeper implicit question is: where do the axioms come from and can we discover all of them (answer: “unknown” and “no”)?
> A speedup in the movement of the maths frontier would be worth many power stations
who is it 'worth' it to? And to what end? I can say with some confidence that many (likely most, albeit certainly not all) mathematicians do not want data centers and power stations to guzzle energy and do their math for them. It's largely a vision imposed from without by Silicon Valley and Google research teams. What do they want it for and why is it (at least for now) "worth" it to them?
Personally, I don't believe for a second that they want it for the good of the mathematical community. Of course, a few of their individual researchers might have their own personal and altruistic motivations; however I don't think this is so relevant.
If it just solves a few formalized problems with formalized theorems, not so much. You can write a program that solves ALL the problems under formalized theorems already. It just runs very slowly.
I can imagine many uses for flows where LLM’s can implement the outer layers above.
Side point, there is no existing AI which can prove - for example - the Poincaré conjecture, even though that has already been proved. The details of the proof are far too dense for any present chatbot like ChatGPT to handle, and nothing like AlphaProof is able either since the scope of the proof is well out of the reach of Lean or any other formal theorem proving environment.
Maths is detached from reality. An AI capable of doing math better than humans may not be able do drive a car, as driving a car requires a good understanding of the world, it has to recognize object and understand their behavior, for example, understanding that a tree won't move but a person might, but it will move slower than another car. It has to understand the physics of the car: inertia, grip, control,... It may even have to understand human psychology and make ethical choices.
Fully autonomous robots would be the endgame.
For example, how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language? What about "The theoretical basis of the October Revolution lay in a development of Marxism, but this development occurred through three successive rounds of theoretical debate"?
Or have I totally misunderstood what people mean when they say that developments in automatic theorem proving will solve LLM's hallucination problem?
First, you need to encode "memes" and relations between them at scale. This is not a language problem, it is data handling problem.
Second, at some point of time you will need to query memes and relations between them, again, at scale. While expression of queries is a language problem, an implementation will heavily use what SQL engines does use.
And you need to look at Cyc: https://en.wikipedia.org/wiki/Cyc
It does what you are toing do for 40 (forty) years now.
It can also be achieved informally and in a fragments way in barely-mathematical disciplines, like biology, linguistics, and even history. We have chains of logical conclusions that do not follow strictly, but with various probabilistic limitations, and under modal logic of sorts. Several contradictory chains follow under the different (modal) assumptions / hypotheses, and often both should be considered. This is where probabilistic models like LLMs could work together with formal logic tools and huge databases of facts and observations, being the proverbial astute reader.
In some more relaxed semi-disciplines, like sociology, psychology, or philosophy, we have a hodgepodge of contradictory, poorly defined notions and hand-wavy reasoning (I don't speak about Wittgenstein here, but more about Freud, Foucault, Derrida, etc.) Here, I think, the current crop of LLMs is applicable most directly, with few augmentations. Still a much, much wider window of context might be required to make it actually productive, by the standards of the field.
What I meant is that there's no AI you can ask to explain the details of Perelman's proof. For example, if there's a lemma or a delicate point in a proof that you don't understand, you can't ask an AI to clarify it.
Generally, I think many people who haven't studied mathematics don't realize how huge the gulf is between "being logical/reasonable" and applying mathematical logic as in a complicated proof. Neither is really of any help for the other. I think this is actually the orthodox position among mathematicians; it's mostly people who might have taken an undergraduate math class or two who might think of one as a gateway to the other. (However there are certainly some basic commonalities between the two. For example, the converse error is important to understand in both.)
> This takes for granted a formal setting, which is what I'm questioning in any of these 'real world' contexts.
A formal model of semantics would likely be a low-level physical representation of possible states augmented with sound definitions of higher-level concepts and objects. I don't think humans are capable of developing a formal semantics that would work for your sentences (it's taken us hundreds of years to approach formalization of particle physics), but I think that an automated prover with access to physical experiments and an LLM could probably start building a more comprehensive semantics.
It will be interesting if/when these models start proving major open problems, e.g. the Riemann Hypothesis. The sociological impact on the mathematical community would certainly be acute, and likely lead to a seismic shift in the understanding of what research-level mathematics is 'for'. This discussion already appears to be in progress. As an outsider I have no idea what the timeline is for such things (2 years? 10? 100?).
On the plus side, AlphaProof has the benefit over ordinary LLMs in their current form in that it does not pollute our common epistemological well, and its output is eminently interrogable (if you know Lean at last).
No, in my experience the whole practical question is, can it be found automatically, or can it not be found automatically? Because there is an exponential search space that conventional automated methods will not be able to chew through, it either works, or it doesn't. AlphaProof shows that for some difficult IMO problems, it works.
But logic is very relevant to "being logical/reasonable", and seeing how mathematicians apply logic in their proofs is very relevant, and a starting point for more complex applications. I see mathematics as the simplest kind of application of logic you can have if you use only your brain for thinking, and not also a computer.
"Being logical/reasonable" also contains a big chunk of intuition/experience, and that is where machine learning will make a big difference.
But I cannot see how consistently doing general mathematics (as in finding interesting and useful statements to proof, and then finding the proofs) is easier than consistently cutting hair/driving a car.
We might get LLM level mathematics, but not Human level mathematics, in the same way that we can get LLM level films (something like Avengers, or the final season of GoT), but we are not going to get Human level films.
I suspect that there are no general level mathematics without the geometric experience of humans, so for general level mathematics one has to go through perceptions and interactions with reality first. In that case, general mathematics is one level over "laying bricks or cutting hair", so more complex. And the paradox is only a paradox for superficial reasoning.
But as things look now, I will be willing to bet that the next major breakthrough in maths will be touted as being AI/LLMs and coming out of one of the big US tech companies rather than some German university.
Why? Simply, the money is much bigger. Such an event would pop the market value of the company involved by a hundred billion - plenty of incentive right there to paint whatever as AI and hire whoever.
Are there any checks for the consistency of all facts?
If Cyc has 100000 axioms/facts, then that's a problem.
Then again, it's fairly likely that P=?NP is decidable but we just don't have any idea how to prove it. In that case the question is more or less "what's the time horizon until AI is vastly better than humans at formal math?", to which the answer is certainly "we don't know, there may be obstacles, just wait and see".
If a computer proves the Reimann Hypothesis, someone will say "Oh of course, writing a proof doesn't require intelligence, it's merely the dumb application of logical rules, but only a human could have thought of the conjecture to begin with."
For all we know, buried deep in AlphaProof's attempts to solve these toy problems, it already tried and discarded several new ideas.
Hilbert's program failed in no contradiction to what GP wrote because the language of FOL theorems is only recursively enumerable and not decidable. It's obvious that something is true if you've found a proof, but if you haven't found a proof yet, is the theorem wrong or do you simply have to wait a little longer?
The main "absolute" difficulty there is in understanding and shaping what the mathematical audience thinks is "interesting". So it's really a marketing problem. Given how these tools are being used for marketing, I would have high hopes, at least for this particular aspect...
>An inference engine is a computer program that tries to derive answers from a knowledge base. The Cyc inference engine performs general logical deduction.[8] It also performs inductive reasoning, statistical machine learning and symbolic machine learning, and abductive reasoning. The Cyc inference engine separates the epistemological problem from the heuristicproblem. For the latter, Cyc used a community-of-agents architecture in which specialized modules, each with its own algorithm, became prioritized if they could make progress on the sub-problem.
Not really.
They can probably isolate 100k facts from some domain for you if you pay for it, but their idea is to run inference on millions of "common sense" and everyday facts and have been for decades.
Now what we've seen with e.g. Chess and Go, is that when you can give a tensor model "real experience" at the speed of however many GPUs you have, the model is quickly capable of superhuman performance. Automatic theorem proving means you can give the model "real experience" without armies of humans writing down proofs, you can generate and validate proofs as fast as a computer will let you and the LLM has "real experience" with every new proof it generates along with an objective cost function, was the proof correct?
Now, this might not let us give the LLMs real experience with being a teacher or dealing with Marxist revolutionaries, but it would be a sea change in the ability of LLMs to do math, and it probably would let us give LLMs real experience in writing software.
> how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language?
Totally out of scope in the any near future for me at least. But that doesn't prevent it from being super useful for a narrower scope. For example:
- How might we take a statement like "(x + 1) (x - 5) = 0" and encode it formally?
- Or "(X X^T)^-1 X Y = B"?
- Or "6 Fe_2 + 3 H_2 O -> ?"?
We can't really do this for a huge swath of pretty narrow applied problems. In the first, what kind of thing is X? Is it an element of an algebraically closed field? In the second, are those matrices of real numbers? In the third, is that 6 times F times e_2 or 6 2-element iron molecules?
We can't formally interpret those as written, but you and I can easily tell what's meant. Meanwhile, current ways of formally writing those things up is a massive pain in the ass. Anything with a decent amount of common sense can tell you what is probably meant formally. We know that we can't have an algorithm that's right 100% of the time for a lot of relevant things, but 99.99% is pretty useful. If clippy says 'these look like matrices, right?' and is almost always right, then it's almost always saving you lots of time and letting lots more people benefit from formal methods with a much lower barrier to entry.
From there, it's easy to imagine coverage and accuracy of formalizable statements going up and up and up until so much is automated that it looks kinda like 'chatting about real-life statements' again. I doubt that's the path, but from a 'make existing useful formal methods super accessible' lens it doesn't have to be.
You probably say "now a human can verify it" but what if the humans are wrong? What is the source of truth?
The source of truth here is the code we wrote for the formal verification system.