Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]

(www.youtube.com)

289 points sandslash | 4 comments | 01 Jul 25 14:00 UTC | HN request time: 0.465s | source

Show context

jandrewrogers ◴[03 Jul 25 05:56 UTC] No.44452056[source]▶

I appreciate the video and generally agree with Fei-Fei but I think it almost understates how different the problem of reasoning about the physical world actually is.

Most dynamics of the physical world are sparse, non-linear systems at every level of resolution. Most ways of constructing accurate models mathematically don’t actually work. LLMs, for better or worse, are pretty classic (in an algorithmic information theory sense) sequential induction problems. We’ve known for well over a decade that you cannot cram real-world spatial dynamics into those models. It is a clear impedance mismatch.

There are a bunch of fundamental computer science problems that stand in the way, which I was schooled on in 2006 from the brightest minds in the field. For example, how do you represent arbitrary spatial relationships on computers in a general and scalable way? There are no solutions in the public data structures and algorithms literature. We know that universal solutions can’t exist and that all practical solutions require exotic high-dimensionality computational constructs that human brains will struggle to reason about. This has been the status quo since the 1980s. This particular set of problems is hard for a reason.

I vigorously agree that the ability to reason about spatiotemporal dynamics is critical to general AI. But the computer science required is so different from classical AI research that I don’t expect any pure AI researcher to bridge that gap. The other aspect is that this area of research became highly developed over two decades but is not in the public literature.

One of the big questions I have had since they announced the company, is who on their team is an expert in the dark state-of-the-art computer science with respect to working around these particular problems? They risk running straight into the same deep, layered theory walls that almost everyone else has run into. I can’t identify anyone on the team that is an expert in a relevant area of computer science theory, which makes me skeptical to some extent. It is a nice idea but I don’t get the sense they understand the true nature of the problem.

Nonetheless, I agree that it is important!

replies(24): >>44452139 #>>44452178 #>>44452230 #>>44452351 #>>44452367 #>>44452546 #>>44452772 #>>44453124 #>>44453326 #>>44453374 #>>44453649 #>>44453761 #>>44454793 #>>44454983 #>>44455580 #>>44456088 #>>44456308 #>>44456958 #>>44457201 #>>44457288 #>>44458172 #>>44458959 #>>44460100 #>>44463896 #

lsy ◴[03 Jul 25 06:51 UTC] No.44452351[source]▶

>>44452056 #

To make this more concrete: ImageNet enabled computer "vision" by providing images + labels, enabling the computer to take an image and spit out a label. LLM training sets enable text completion by providing text + completions, enabling the computer to take a piece of text and spit out its completion. Learning how the physical world works (not just kind of works a la videogames, actually works) is not only about a jillion times more complicated, there is really only one usable dataset: the world itself, which cannot be compacted or fed into a computer at high speed.

"Spatial awareness" itself is kind of a simplification: the idea that you can be aware of space or 3d objects' behavior without the social context of what an "object" is or how it relates to your own physical existence. Like you could have two essentially identical objects but they are not interchangeable (original Declaration of Independence vs a copy, etc). And many many other borderline-philosophical questions about when an object becomes two, etc.

replies(5): >>44452957 #>>44454963 #>>44455085 #>>44455238 #>>44458175 #

coldtea ◴[03 Jul 25 08:44 UTC] No.44452957[source]▶

>>44452351 #

>there is really only one usable dataset: the world itself, which cannot be compacted or fed into a computer at high speed.

Why wouldn't it be? If the world is ingressed via video sensors and lidar sensor, what's the hangup in recording such input and then replaying it faster?

replies(2): >>44453087 #>>44455612 #

psb217 ◴[03 Jul 25 09:07 UTC] No.44453087[source]▶

>>44452957 #

I think there's an implicit assumption here that interaction with the world is critical for effective learning. In that case, you're bottlenecked by the speed of the world... when learning with a single agent. One neat thing about artificial computational agents, in contrast to natural biological agents, is that they can share the same brain and share lived experience, so the "speed of reality" bottleneck is much less of an issue.

replies(2): >>44454014 #>>44456957 #

HappMacDonald ◴[03 Jul 25 11:45 UTC] No.44454014[source]▶

>>44453087 #

Yeah I'm envisioning putting a thousand simplistic robotic "infants" into a vast "playpen" to gather sensor data about their environment, for some (probably smaller) number of deep learning models to ingest the input and guess at output strategies (move this servo, rotate this camshaft this far in that direction, etc) and make predictions about resulting changes to input.

In principle a thousand different deep learning models could all train simultaneously on a thousand different robot experience feeds.. but not 1 to 1, but instead 1 to many.. each neural net training on data from dozens or hundreds of the robots at the same time, and different neural nets sharing those feeds for their own rounds of training.

Then of course all of the input data paired with outputs tested and further inputs as ground truth to predictions can be recorded for continued training sessions after the fact.

replies(3): >>44454692 #>>44455219 #>>44455989 #

1. loa_in_ ◴[03 Jul 25 13:12 UTC] No.44454692[source]▶

>>44454014 #

But the playpen will contain objects that are inherently breakable. You cannot rough handle the glass vessel and have it too.

replies(2): >>44455167 #>>44467315 #

2. m-s-y ◴[03 Jul 25 13:59 UTC] No.44455167[source]▶

>>44454692 (TP) #

The world Is breakable. Any model based on it will need to know this anyway. Am I missing your argument?

replies(1): >>44455833 #

3. devenson ◴[03 Jul 25 15:07 UTC] No.44455833[source]▶

>>44455167 #

Can't reset state after breakage.

4. HappMacDonald ◴[04 Jul 25 19:45 UTC] No.44467315[source]▶

>>44454692 (TP) #

Basically everything applicable to the playpen of a human baby is applicable to the playpen of an AI robot baby in this setup, to at least some degree.

Perhaps the least applicable part is that "robot hurting itself" has the liability of some cost to replace the broken robot part, vs the potentially immeasurable cost of a human infant injuring themselves.

If it's not a good idea to put a "glass vessel" in a human crib (strictly from an "I don't want the glass vessel to be damaged" sense) then it's not a good idea to put that in the robot-infant crib either.

Give them something less expensive to repair, like a stack of blocks instead. :P

↑