However I could see this being useful for verified software development, which usually involves a huge amount of tedious and uninteresting lemmas, and the size of the proofs becomes exhausting. Having an LLM check all 2^8 configurations of something seems worthwhile.
Think -- why has nobody been able to make an LLM play Go better than AlphaZero while still retaining language capabilities? It certainly would have orders of magnitude more parameters.
This is prof. Robinson on Kantian philosophy - check out Oxford podcasts by the way - and this quote is meant to imply that building a coherent world out of raw sensory data and statistics alone is completely and utterly impractical if not outright impossible. While I don't think he meant to refer to any kind of AI, in my mind this description also aptly describes the general method of DL neural networks. Repeated exposure to find correlation.
How does one find order through associativity alone? With AI this is not an academic problem anymore. This has become practical. Kant says it is impossible, not just unlikely.
The Kantian project and the various core issues it tries to address seems readily applicable to AI research yet I see very little mention of it. Perhaps I am just dumb though. Building a mind capable of taming tremendous sensory flux needs to, at the very least, take note of the (many) fundamental issues he raised. Issues I feel are not at all trivial to set aside. I feel we are stuck in Hume's empiricist reasoning and have yet to graduate to Kant and beyond.
Are we now somehow convinced yet again that causality and reasoning will, in fact, after all spontaneously emerge out of pure chaos? Didn't we settle the impossibility of this a few hundred years ago?
Search is a great tool for AI, but you have to have a reasonable search space (like chess or go or poker). I'm not close enough to Lean to understand if it can do something like that (i believe not), but math doesn't feel like a thing where you can reasonably "search" next steps, for most problems.
There was not enough detail to determine what was ultimately useful & truly better, if anything. So lessons learned near useless. Likewise, I could not tell how useful the fine-tuning was and why, vs basic other tricks that would avoid all this complexity. The work seems good, but I found almost no scientific value in the experimentation and reporting. So I can't comment because there is little to comment on that I normally would. We focus more on the coding & analysis side, logical QA on fuzzier questions, so I am genuinely curious, supportive, am informed, etc, but left frustrated and wanting my time back.
For your comment to make sense it would have to include some context of computer science and artificial intelligence research, a significant amount of which is directly applicable to the topic you mention.
It's not that you are dumb, you are just being willfully ignorant. I'm not an expert on machine learning but I do know enough to say confidently that there is a lot of research about building world models from sensory data.
Hume's reasoning on this particular issue, and I am taking liberties here, boils down to the idea that anything can cause anything and there is no necessary connection between anything. At least, no connection we would be able to gather with our senses. The causal, necessary, connection between one billiard ball causing a second ball to move is not to be found anywhere in raw sensory data itself. You will not find a "third element", the "causal relationship", anywhere. There is just raw sensory data, one ball coming from the left, two balls besides each other and then one ball moving away to the right. The idea that one ball caused the other to move is made up. It is fiction. It is, at best, a habit of the mind to find those sort of correlations and label them as "causal". I dare you to find a flaw in that argument. As convincing as it is, it is pretty damning for any enterprise that wants to call itself scientific or even rational. Nothing we will experience, nothing we will ever think up, no matter how sophisticated, will, on a fundamental level, ever amount to anything more than "more or less probable".
This famously awoke Kant from his "dogmatic slumber". Luckily for him he found some problems in Hume's argument, and again I am taking liberties, because to entertain even the idea of an external world filled with objects like billiard balls presupposes the existence of tiny, slightly important things like, oh I don't know, time and space itself. Hume, where do you pull these from? You can look in raw sensory for evidence of time and space for a long time and, like looking for causality, you'll come up empty-handed, unless, and here is the point, you bring those notions with you and "wear those glasses", so to speak. You massage the data so it will fit the spatio-temporal domain and now you can start making sense of it and not a figurative second sooner.
There are all sorts of parallels here with problems in AI (IMO). Neural networks are asked to infer concepts like time, space and causality by just looking at a lot of data and I can't help but be skeptical of success. The interesting thing to me here is that AI has made these dry and academic philosophical debates practical and useful. Hume talks about billiard balls, but it is easy to convert this into ML lingo by considering, say, some excitation of an array of artificial neurons that is followed by another configuration of excitation. What is their connection? How will you ever unearth "causality" from these completely unconnected events? Nothing about this problem has changed its nature at all in the past few hundred years.
If "causality" or "necessary connection" is too abstract for your taste, consider that to, say, have any type of memory mechanism at all you have to have some sort of sensory apparatus - say, a neuron or some multitude of them - that is capable of holding, say, event A and some unit of time later, event B and can then connect those two by assigning a probability of some kind between them. Is there any other way? Can you build memory without using a mechanism vaguely of this kind? But notice you are bringing the notion of the temporal to the data instead of the other way around. Nothing about event A or event B can tell you what the nature of time is. You bring it inside your sensor which has a "before" and "after" slot. Kant would say "Aha! There it is. You could not find anything in the data so you had to preprocess it in order to make it intelligible", but he would do it in dense, long-winded, inscrutable German. (He'd probably make fun of you without you knowing it as well.)
It is through the nature of this, in our case temporal, sensor that any kind of temporal connection can be made, not through the data itself. That is quite something and I am having a hard time refuting this line of reasoning. If you need more than space, time and causality you can consider the problem of "substance": how will you keep track of an object that alters its appearance? How do you "know" that some entity is merely changing appearance by, say changing clothes or moving through a dark spot and is thus dimly lit all of a suddenly, but is "essentially" the same? What's this "essentially"? How much of an sensory impression can change before it is a "different entity"? This problem has the same character as the temporal and causal problem. The data itself will not be illuminating unless you bring "substance glasses" with you.
Strong AI might be found implementing Kantian category sensors like Unity, Plurality, Causality, Substance, etc. A guy can dream right.
I have elaborated in a sibling comment. I do not dare to ask you to consider it because it is quite loquacious, but in the off-chance you want to indulge in some armchair philosophy you can join me in my pedantry and enlighten a lost soul.
Science (natural sciences) was originally known as "natural philosophy", it was just a branch of philosophy that happened to be successful enough to stop being called philosophy. So arguably there's still a huge amount of value in the philosophical ideas and attitudes that led to such important discoveries.
As with most things in life, the truth is likely somewhere boring in the middle.
Can abstract concepts such as causality be "learned" directly from data? It's an important question, but answering such a question might not be necessary to train an AI model to make good predictions. It might be enough to show an AI model videos of billiard games, then to pause at random points in those videos and ask the model to predict what would happen next. (For example, the video could be paused at the very moment one billiard ball hit a second billiard ball.) If the AI model manages to correctly predict what the very next frame looks like (the second billiard ball starts to move), and it produces similar predictions in analogous situations of things colliding with other things, then it arguably "understands" this form of causality. If it fails to predict the next frame, the neural network training algorithm can adjust the neural network's neuron weights such that it will be inclined to make the correct prediction in the future. Over the course of many of these predictions and adjustments, over thousands or millions of hours of video, the expectation is that a system will emerge that is capable of making correct predictions in a variety of situations, as that means it has managed to develop general rules or "understand" causality.
As the AI developer I wouldn't even have to be aware of e.g. why a billiard ball could cause another to move, because if my neural network can make accurate predictions in such situations, it's useful and demonstrates understanding. Such an understanding may in fact require the neural network to internally represent such concepts as causality / laws of physics (as your discussion implies), but I would not have to concern myself with the details of how the neural network actually perceives the world, if the predictions are consistent and reliable. With large amounts of data and computation, the neural network weights could be tuned to create a network that is able to make good predictions in any domain at all. That's the idea anyway.
Despite the optimism I think that there's something missing. There are simpler problem domains, like SimpleLogic (https://arxiv.org/abs/2205.11502), where researchers show that learning logical reasoning from data is apparently impossible in practice, because various spurious statistical correlations in data strongly encourage ML systems to learn incorrect approaches to solving logic problems, sort of like how using incorrect methods can sometimes still give you the correct answer in maths problems. The problem here is that the incorrect method might give valid answers in some situations, but will absolutely give wrong answers in others. I think that this issue is a severe and possibly insurmountable flaw of the current approach to AI, despite the impressive results we've seen so far, because real-world data such as video, images, and text, is far messier than artificial examples used in this research paper. But who knows what the future will bring?
1. Plausible to discuss, and find hints of what "AI" might mean - ("AGI" or "Strong AI").
2. Dig up all those crusty and dusty notions of philosophers long dead, and some nerding out reads from my teenage years. And try to apply them to an interesting system to study empirically, instead of just humanistic pondering.
3. Same with psychology! Now that our pattern matching systems are starting to pass various turing tests, what does that mean for "sentience" and "sapience" and should we all become armchair psychologists as we teach our systems to understand and act in the world? Recall the first part of Oddyssey 2010; where the AI researcher is more of a child psychologist...
Lastly - LLMs and "glimpses of AI" are something new! Not the same old same old recycled tech and popculture. Truly something new under the sun. Good times.
Naively, it would seem like transformers could line up nicely with turn-based games. Instead of mapping tokens to language as in an LLM, they could map to valid moves given the current game state. And then instead of optimizing the next token for linguistic coherence as LLMs do, you optimize for winning the game.
> Proof assistants like Lean have revolutionized mathematical proof verification... > Both the synthetic dataset and the model will be made available to facilitate further research in this promising field.
Please don't use "revolutionised", "promising" on a scientific abstract... If it has revolutionised, tell me instead what was the important thing that made it happen. If it is promising, tell me instead why.
Appart from that, the second sentence says:
> Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem proving is hindered by a lack of training data.
Is it really? Of hindered by unsuitable model architecture? Because you could perefectly say that Lean is an AI capable of theorem proving... I feel that that sentence should be the potential conclusion of the paper, not the first observation. Like, "we used more training data and it works better, if we scale theorem proving training data, we can have the performance that we want, so training data is actually hindering the advancement in formal theorem proving on LLMs".
However, such reward system is not available for LLM in an open domain setting.
The text of the bitter lesson itself provides an interesting case, on the topic of computer vision: "Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better."
To me this proves the necessity of proper measurements devices - in this case "convolution" and "certain kinds of invariances" - to render data as intelligible as possible utilizing as little moving parts as required. What I don't see in this is: "we need to drop all preexisting notions of structure and just focus on finding pure correlation only". The fact that we have moved to more fundamental search operations - from "cylinder" to "convolution" - does not IMO fundamentally alter the general strategy. In short, I think the lesson is being stretched beyond its limits.
The lesson shows us: "We want AI agents that can discover like we can, not which contain what we have discovered" and I couldn't agree more, yet we seems to disagree on what it means to "discover like we can". I happen to believe that raw correlation is just one of the key ingredients. In fact Kant has the notion of "community" that, and I might be sorely mistaken here, appears to capture the associative power of cognition in much the same way.
I, of course, lack any sort of credentials to further my point. I only imagine it might possibly be of slight interest to someone to squint upon this topic from a skewed angle.
PS: Your point on pragmatism is well taken. I'm sure we are all quite excited to see what happens over the next couple of years, yet I cannot help but hear an ominous violin crescendo beneath your "if the predictions are consistent and reliable".
'Formal' which here means inputs to a proof verifier/proof assistant, and I think most people would accept it as a given that by the extremely data-hungry standards of LLMs there is a severe lack of such data, and that this is a severe hindrance to getting an LLM to generalise. The question is how they might scale with more training data.
I was skeptical of the statistical approach myself, but I think it's time for some humility in the face of its wild success.
The old joke about 1 + 1 = 2 being on page 360 of Principia Mathematica still largely holds.
Edit: Ok, they generate proofs with the LLM, then they validate them with the Lean 4 environment. I am not sure if this is better than generating cooking recipes with a LLM and asking a cook if they make sense or not.
In the context of automated theorem proving, proof search and proof checking could not be more different.
Proof search is undecidable for most logics. For that reason, heuristics are commonly used to guide proof search. This has always been the case; heuristic proof search predates the use of LLMs for theorem proving by decades.
Proof checking, on the other hand, is not just decidable but usually quite tractable and cheap. Therefore, proof checkers are typically deterministic and fast. Ideally the checker is also implemented in small amount of relatively simple and heavily audited code.
For a high-level description of how provers are typically architected, see https://www.pls-lab.org/en/de_Bruijn_criterion (or https://lawrencecpaulson.github.io/2022/01/05/LCF.html which is linked to at the bottom of the above page and goes into a bit more historical detail and nuance).
> I am not sure if this is better than generating cooking recipes with a LLM and asking a cook if they make sense or not.
Conceptually it's not at all similar to asking a cook if a recipe is "correct". For lean proofs, "correct" has an entirely precise meaning. The "cook" (i.e., proof checker) is basically free (much less than $0.01/hr). The cook (proof checker) takes at most a few seconds to read and respond to any particular recipe. And the cook (i.e., checker) never makes mistakes.
The relationship between asking a cook if a recipe is good and asking Lean to check a proof is approximately the difference between asking a mediocre human computer to compute a derivative and asking a modern mechanical computer to do the same (using a classical algorithm for doing so, not an LLM; no one is using LLMs to check proofs -- there's really no point to doing so, because the checking problem is easy).