Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]

(www.youtube.com)

289 points sandslash | 3 comments | 01 Jul 25 14:00 UTC | HN request time: 0.624s | source

Show context

jandrewrogers ◴[03 Jul 25 05:56 UTC] No.44452056[source]▶

I appreciate the video and generally agree with Fei-Fei but I think it almost understates how different the problem of reasoning about the physical world actually is.

Most dynamics of the physical world are sparse, non-linear systems at every level of resolution. Most ways of constructing accurate models mathematically don’t actually work. LLMs, for better or worse, are pretty classic (in an algorithmic information theory sense) sequential induction problems. We’ve known for well over a decade that you cannot cram real-world spatial dynamics into those models. It is a clear impedance mismatch.

There are a bunch of fundamental computer science problems that stand in the way, which I was schooled on in 2006 from the brightest minds in the field. For example, how do you represent arbitrary spatial relationships on computers in a general and scalable way? There are no solutions in the public data structures and algorithms literature. We know that universal solutions can’t exist and that all practical solutions require exotic high-dimensionality computational constructs that human brains will struggle to reason about. This has been the status quo since the 1980s. This particular set of problems is hard for a reason.

I vigorously agree that the ability to reason about spatiotemporal dynamics is critical to general AI. But the computer science required is so different from classical AI research that I don’t expect any pure AI researcher to bridge that gap. The other aspect is that this area of research became highly developed over two decades but is not in the public literature.

One of the big questions I have had since they announced the company, is who on their team is an expert in the dark state-of-the-art computer science with respect to working around these particular problems? They risk running straight into the same deep, layered theory walls that almost everyone else has run into. I can’t identify anyone on the team that is an expert in a relevant area of computer science theory, which makes me skeptical to some extent. It is a nice idea but I don’t get the sense they understand the true nature of the problem.

Nonetheless, I agree that it is important!

replies(24): >>44452139 #>>44452178 #>>44452230 #>>44452351 #>>44452367 #>>44452546 #>>44452772 #>>44453124 #>>44453326 #>>44453374 #>>44453649 #>>44453761 #>>44454793 #>>44454983 #>>44455580 #>>44456088 #>>44456308 #>>44456958 #>>44457201 #>>44457288 #>>44458172 #>>44458959 #>>44460100 #>>44463896 #

teemur ◴[03 Jul 25 09:13 UTC] No.44453124[source]▶

>>44452056 #

> We know that universal solutions can’t exist and that all practical solutions require exotic high-dimensionality computational constructs that human brains will struggle to reason about. This has been the status quo since the 1980s. This particular set of problems is hard for a reason.

This made me a bit curious. Would you have any pointers to books/articles/search terms if one wanted to have a bit deeper look on this problem space and where we are?

replies(1): >>44456747 #

jandrewrogers ◴[03 Jul 25 16:31 UTC] No.44456747[source]▶

>>44453124 #

I'm not aware of any convenient literature but it is relatively obvious once someone explains it to you (as it was explained to me).

At its root it is a cutting problem, like graph cutting but much more general because it includes things like non-trivial geometric types and relationships. Solving the cutting problem is necessary to efficiently shard/parallelize operations over the data models.

For classic scalar data models, representations that preserve the relationships have the same dimensionality as the underlying data model. A set of points in 2-dimensions can always be represented in 2-dimensions such that they satisfy the cutting problem (e.g. a quadtree-like representation).

For non-scalar types like rectangles, operations like equality and intersection are distinct and there are an unbounded number of relationships that must be preserved that touch on concepts like size and aspect ratio to satisfy cutting requirements. The only way to expose these additional relationships to cutting algorithms is to encode and embed these other relationships in a (much) higher dimensionality space and then cut that space instead.

The mathematically general case isn't computable but real-world data models don't need it to be. Several decades ago it was determined that if you constrain the properties of the data model tightly enough then it should be possible to systematically construct a finite high-dimensionality embedding for that data model such that it satisfies the cutting problem.

Unfortunately, the "should be possible" understates the difficulty. There is no computer science literature for how one might go about constructing these cuttable embeddings, not even for a narrow subset of practical cases. The activity is also primarily one of designing data structures and algorithms that can represent complex relationships among objects with shape and size in dimensions much greater than three, which is cognitively difficult. Many smart people have tried and failed over the years. It has a lot of subtlety and you need practical implementations to have good properties as software.

About 20 years ago, long before "big data", the iPhone, or any current software fashion, this and several related problems were the subject of an ambitious government research program. It was technically successful, demonstrably. That program was killed in the early 2010s for unrelated reasons and much of that research was semi-lost. It was so far ahead of its time that few people saw the utility of it. There are still people around that were either directly involved or learned the computer science second-hand from someone that was but there aren't that many left.

replies(4): >>44457047 #>>44457535 #>>44459356 #>>44459880 #

1. calf ◴[03 Jul 25 17:50 UTC] No.44457535[source]▶

>>44456747 #

But then that sounds more like that person explained it wrong. They didn't explain why it is necessary to reduce to GRAPHCUT, it seems to me to beg the question. We should not assume this is true based on some vague anthropomorphic appeal to spatial locality, surely?

replies(1): >>44458384 #

2. jandrewrogers ◴[03 Jul 25 19:26 UTC] No.44458384[source]▶

>>44457535 (TP) #

It isn’t a graph cutting problem, graph cutting is just a simpler, special case of this more general cutting problem (h/t IBM Research). If you can solve the general problem you effectively get efficient graph cutting for free. This is obviously attractive to the extent you can do both complex spatial and graph computation at scale on the same data structure instead of specializing for one or the other.

The challenge with cutting e.g. rectangles into uniform subsets is that logical shard assignment must be identical regardless of insertion order and in the absence of an ordering function, with O(1) space complexity and without loss of selectivity. Arbitrary sets of rectangles overlap, sometimes heavily, which is the source of most difficulty.

Of course, with practical implementations write scalability matters and incremental construction is desirable.

replies(1): >>44468820 #

3. calf ◴[04 Jul 25 23:26 UTC] No.44468820[source]▶

>>44458384 #

Well, previously you said that it (presumably "it" broadly refers to spatial reasoning AI) is a "high dimensional complex type cutting problem".

You said this is obvious once explained. I don't see this as obvious, rather, I see this as begging the question--the research program you were secretly involved in wanted to parallelize the engineering of it so obviously they needed some fancy "cutting algorithm" to make it possible.

The problem is that this conflated the scientific statement of what "spatial reasoning" is. There's no obvious explanation why spatial reasoning should intuitively be some kind of cutting problem however you wish to define or generalize a cutting problem. That's not how good CS research is done or explained.

In fact I could (mimicking your broad assertions) go so far as to claim, the project was doomed to fail because they weren't really trying to understand something, they want to make something without understanding it as the priority. So they were constrained by the parallel technology that they had at the time, and when the computational power available didn't pan out they reached a natural dead end.

↑