DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

(arxiv.org)

186 points hhs | 1 comments | 14 Oct 24 15:44 UTC | HN request time: 0.247s | source

Show context

whyowhy3484939 ◴[14 Oct 24 18:18 UTC] No.41840292[source]▶

"Suppose you try to construct a coherent, ordered, natural world with no resource other than repeated exposure to things, and the formation of certain associative bonds. Oh, please!"

This is prof. Robinson on Kantian philosophy - check out Oxford podcasts by the way - and this quote is meant to imply that building a coherent world out of raw sensory data and statistics alone is completely and utterly impractical if not outright impossible. While I don't think he meant to refer to any kind of AI, in my mind this description also aptly describes the general method of DL neural networks. Repeated exposure to find correlation.

How does one find order through associativity alone? With AI this is not an academic problem anymore. This has become practical. Kant says it is impossible, not just unlikely.

The Kantian project and the various core issues it tries to address seems readily applicable to AI research yet I see very little mention of it. Perhaps I am just dumb though. Building a mind capable of taming tremendous sensory flux needs to, at the very least, take note of the (many) fundamental issues he raised. Issues I feel are not at all trivial to set aside. I feel we are stuck in Hume's empiricist reasoning and have yet to graduate to Kant and beyond.

Are we now somehow convinced yet again that causality and reasoning will, in fact, after all spontaneously emerge out of pure chaos? Didn't we settle the impossibility of this a few hundred years ago?

replies(4): >>41840690 #>>41841404 #>>41842278 #>>41844232 #

viraptor ◴[14 Oct 24 18:59 UTC] No.41840690[source]▶

>>41840292 #

The philosophy angle is interesting of course, but are any of those claims proven true? Why would someone stop trying to achieve something just because Kant's view of the world says it's impossible? Philosophies come and go and get refined over time. Meanwhile you only need to find one edge case where they don't apply the way Kant imagined it. Or find an area where the claim is moot in practice because you achieved all your goals anyway.

replies(2): >>41841687 #>>41844080 #

whyowhy3484939 ◴[14 Oct 24 20:35 UTC] No.41841687[source]▶

>>41840690 #

I can appreciate this very practical stance and I naturally urge all my technical colleagues to persist in their struggle for pragmatic victory, but I can't help but voice some concerns that at the very least may lead to an illuminating response capable of at long last disabusing me of my critical notions. You may imagine similar concerns would perhaps arise in you if massive societal resources were to be invested in finding the "next biggest number" because a multitude of people have decided math can't be trusted or some loophole is thought to be found through empirical effort alone.

Hume's reasoning on this particular issue, and I am taking liberties here, boils down to the idea that anything can cause anything and there is no necessary connection between anything. At least, no connection we would be able to gather with our senses. The causal, necessary, connection between one billiard ball causing a second ball to move is not to be found anywhere in raw sensory data itself. You will not find a "third element", the "causal relationship", anywhere. There is just raw sensory data, one ball coming from the left, two balls besides each other and then one ball moving away to the right. The idea that one ball caused the other to move is made up. It is fiction. It is, at best, a habit of the mind to find those sort of correlations and label them as "causal". I dare you to find a flaw in that argument. As convincing as it is, it is pretty damning for any enterprise that wants to call itself scientific or even rational. Nothing we will experience, nothing we will ever think up, no matter how sophisticated, will, on a fundamental level, ever amount to anything more than "more or less probable".

This famously awoke Kant from his "dogmatic slumber". Luckily for him he found some problems in Hume's argument, and again I am taking liberties, because to entertain even the idea of an external world filled with objects like billiard balls presupposes the existence of tiny, slightly important things like, oh I don't know, time and space itself. Hume, where do you pull these from? You can look in raw sensory for evidence of time and space for a long time and, like looking for causality, you'll come up empty-handed, unless, and here is the point, you bring those notions with you and "wear those glasses", so to speak. You massage the data so it will fit the spatio-temporal domain and now you can start making sense of it and not a figurative second sooner.

There are all sorts of parallels here with problems in AI (IMO). Neural networks are asked to infer concepts like time, space and causality by just looking at a lot of data and I can't help but be skeptical of success. The interesting thing to me here is that AI has made these dry and academic philosophical debates practical and useful. Hume talks about billiard balls, but it is easy to convert this into ML lingo by considering, say, some excitation of an array of artificial neurons that is followed by another configuration of excitation. What is their connection? How will you ever unearth "causality" from these completely unconnected events? Nothing about this problem has changed its nature at all in the past few hundred years.

If "causality" or "necessary connection" is too abstract for your taste, consider that to, say, have any type of memory mechanism at all you have to have some sort of sensory apparatus - say, a neuron or some multitude of them - that is capable of holding, say, event A and some unit of time later, event B and can then connect those two by assigning a probability of some kind between them. Is there any other way? Can you build memory without using a mechanism vaguely of this kind? But notice you are bringing the notion of the temporal to the data instead of the other way around. Nothing about event A or event B can tell you what the nature of time is. You bring it inside your sensor which has a "before" and "after" slot. Kant would say "Aha! There it is. You could not find anything in the data so you had to preprocess it in order to make it intelligible", but he would do it in dense, long-winded, inscrutable German. (He'd probably make fun of you without you knowing it as well.)

It is through the nature of this, in our case temporal, sensor that any kind of temporal connection can be made, not through the data itself. That is quite something and I am having a hard time refuting this line of reasoning. If you need more than space, time and causality you can consider the problem of "substance": how will you keep track of an object that alters its appearance? How do you "know" that some entity is merely changing appearance by, say changing clothes or moving through a dark spot and is thus dimly lit all of a suddenly, but is "essentially" the same? What's this "essentially"? How much of an sensory impression can change before it is a "different entity"? This problem has the same character as the temporal and causal problem. The data itself will not be illuminating unless you bring "substance glasses" with you.

Strong AI might be found implementing Kantian category sensors like Unity, Plurality, Causality, Substance, etc. A guy can dream right.

replies(2): >>41843306 #>>41843900 #

ahkpasp ◴[14 Oct 24 23:28 UTC] No.41843306[source]▶

>>41841687 #

I think today's AI researchers' preoccupation with just learning directly from data in an effort to achieve AGI is understandable, because this data-driven approach has generated results that are far more impressive than all the previous approaches that involved humans implementing concepts / reverse-engineering how the human mind works in order to create a truly intelligent machine (see Sutton's Bitter Lesson http://www.incompleteideas.net/IncIdeas/BitterLesson.html). Because of the demonstrated success of these systems, a popular hypothesis is that maybe increasingly large amounts of data and computation is enough to solve all the remaining problems in AI.

Can abstract concepts such as causality be "learned" directly from data? It's an important question, but answering such a question might not be necessary to train an AI model to make good predictions. It might be enough to show an AI model videos of billiard games, then to pause at random points in those videos and ask the model to predict what would happen next. (For example, the video could be paused at the very moment one billiard ball hit a second billiard ball.) If the AI model manages to correctly predict what the very next frame looks like (the second billiard ball starts to move), and it produces similar predictions in analogous situations of things colliding with other things, then it arguably "understands" this form of causality. If it fails to predict the next frame, the neural network training algorithm can adjust the neural network's neuron weights such that it will be inclined to make the correct prediction in the future. Over the course of many of these predictions and adjustments, over thousands or millions of hours of video, the expectation is that a system will emerge that is capable of making correct predictions in a variety of situations, as that means it has managed to develop general rules or "understand" causality.

As the AI developer I wouldn't even have to be aware of e.g. why a billiard ball could cause another to move, because if my neural network can make accurate predictions in such situations, it's useful and demonstrates understanding. Such an understanding may in fact require the neural network to internally represent such concepts as causality / laws of physics (as your discussion implies), but I would not have to concern myself with the details of how the neural network actually perceives the world, if the predictions are consistent and reliable. With large amounts of data and computation, the neural network weights could be tuned to create a network that is able to make good predictions in any domain at all. That's the idea anyway.

Despite the optimism I think that there's something missing. There are simpler problem domains, like SimpleLogic (https://arxiv.org/abs/2205.11502), where researchers show that learning logical reasoning from data is apparently impossible in practice, because various spurious statistical correlations in data strongly encourage ML systems to learn incorrect approaches to solving logic problems, sort of like how using incorrect methods can sometimes still give you the correct answer in maths problems. The problem here is that the incorrect method might give valid answers in some situations, but will absolutely give wrong answers in others. I think that this issue is a severe and possibly insurmountable flaw of the current approach to AI, despite the impressive results we've seen so far, because real-world data such as video, images, and text, is far messier than artificial examples used in this research paper. But who knows what the future will bring?

replies(1): >>41846304 #

whyowhy3484939 ◴[15 Oct 24 08:30 UTC] No.41846304[source]▶

>>41843306 #

The current generation's interest is indeed quite understandably focused on, pardon the simplification, data-driven correlation engines and the results have been nothing but spectacular. Let it be clear that I do not reject the idea(s) behind the bitter lesson. On the contrary, I think it is spot on, yet I feel there is in a great many instances a false, mostly unspoken, dichotomy being imagined behind those words: there is either A) raw probabilistic association lacking any and all structure or B) full scale good old fashioned symbolic AI containing nothing but highly specific, human, knowledge.

The text of the bitter lesson itself provides an interesting case, on the topic of computer vision: "Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better."

To me this proves the necessity of proper measurements devices - in this case "convolution" and "certain kinds of invariances" - to render data as intelligible as possible utilizing as little moving parts as required. What I don't see in this is: "we need to drop all preexisting notions of structure and just focus on finding pure correlation only". The fact that we have moved to more fundamental search operations - from "cylinder" to "convolution" - does not IMO fundamentally alter the general strategy. In short, I think the lesson is being stretched beyond its limits.

The lesson shows us: "We want AI agents that can discover like we can, not which contain what we have discovered" and I couldn't agree more, yet we seems to disagree on what it means to "discover like we can". I happen to believe that raw correlation is just one of the key ingredients. In fact Kant has the notion of "community" that, and I might be sorely mistaken here, appears to capture the associative power of cognition in much the same way.

I, of course, lack any sort of credentials to further my point. I only imagine it might possibly be of slight interest to someone to squint upon this topic from a skewed angle.

PS: Your point on pragmatism is well taken. I'm sure we are all quite excited to see what happens over the next couple of years, yet I cannot help but hear an ominous violin crescendo beneath your "if the predictions are consistent and reliable".

replies(1): >>41847166 #

1. QuesnayJr ◴[15 Oct 24 10:43 UTC] No.41847166[source]▶

>>41846304 #

There has been considerable progress since Sutton wrote. The latest models don't even need convolutions and certain kinds of invariances. Visual Transformer models don't even necessarily know which pixels are near each other. They figure it out through correlations.

I was skeptical of the statistical approach myself, but I think it's time for some humility in the face of its wild success.

↑