Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]

(www.youtube.com)

289 points sandslash | 2 comments | 01 Jul 25 14:00 UTC | HN request time: 0.004s | source

Show context

jandrewrogers ◴[03 Jul 25 05:56 UTC] No.44452056[source]▶

I appreciate the video and generally agree with Fei-Fei but I think it almost understates how different the problem of reasoning about the physical world actually is.

Most dynamics of the physical world are sparse, non-linear systems at every level of resolution. Most ways of constructing accurate models mathematically don’t actually work. LLMs, for better or worse, are pretty classic (in an algorithmic information theory sense) sequential induction problems. We’ve known for well over a decade that you cannot cram real-world spatial dynamics into those models. It is a clear impedance mismatch.

There are a bunch of fundamental computer science problems that stand in the way, which I was schooled on in 2006 from the brightest minds in the field. For example, how do you represent arbitrary spatial relationships on computers in a general and scalable way? There are no solutions in the public data structures and algorithms literature. We know that universal solutions can’t exist and that all practical solutions require exotic high-dimensionality computational constructs that human brains will struggle to reason about. This has been the status quo since the 1980s. This particular set of problems is hard for a reason.

I vigorously agree that the ability to reason about spatiotemporal dynamics is critical to general AI. But the computer science required is so different from classical AI research that I don’t expect any pure AI researcher to bridge that gap. The other aspect is that this area of research became highly developed over two decades but is not in the public literature.

One of the big questions I have had since they announced the company, is who on their team is an expert in the dark state-of-the-art computer science with respect to working around these particular problems? They risk running straight into the same deep, layered theory walls that almost everyone else has run into. I can’t identify anyone on the team that is an expert in a relevant area of computer science theory, which makes me skeptical to some extent. It is a nice idea but I don’t get the sense they understand the true nature of the problem.

Nonetheless, I agree that it is important!

replies(24): >>44452139 #>>44452178 #>>44452230 #>>44452351 #>>44452367 #>>44452546 #>>44452772 #>>44453124 #>>44453326 #>>44453374 #>>44453649 #>>44453761 #>>44454793 #>>44454983 #>>44455580 #>>44456088 #>>44456308 #>>44456958 #>>44457201 #>>44457288 #>>44458172 #>>44458959 #>>44460100 #>>44463896 #

ccozan ◴[03 Jul 25 06:21 UTC] No.44452178[source]▶

>>44452056 #

I agree that the problem is hard. However, biological brain is able to handle it quite "easily" ( is not really easy - bilions of iterations were needed ). The current brains are solving this 3D physical world _only_ via perception.

So this is place were we must look. It starts with the sensing and the integration of that sensing. I am working at this problem since more than 10 years and I came to some results. I am not a real scientist but a true engineer and I am looking from that perspective quite intesely: The question that one must ask is: how do you define the outside physical world from the perspective of a biological sensing "device" ? what exactly are we "seeing" or "hearing"? So yes, working on that brought it further in defining the physical world.

replies(3): >>44455626 #>>44456965 #>>44457188 #

1. tmilard ◴[03 Jul 25 14:47 UTC] No.44455626[source]▶

>>44452178 #

I do agree with you. We have an natural eye (what you call a 'biological brain') automat that inconsciouly 'feels' the structure of a geometric of the places we enter to.

Once this layer of "natural eye automat" is programmed behind a camera, it will spit out this crude geometry : the Spacial-data-bulk (SDB). This SDB is small data.

From now on, our programs will only do reason, not on data froms camera(s) but only on this small SBD.

This is how I see it.

replies(1): >>44455638 #

2. tmilard ◴[03 Jul 25 14:49 UTC] No.44455638[source]▶

>>44455626 (TP) #

==> And now the LLMs, to feel Spacial knowledge, will have a very reduce dataset. This will make spacial data reasoning very less intencive than we can't imagine.

↑