Fei-Fei Li: Spatial intelligence is the next frontier in AI [video]

(www.youtube.com)

289 points sandslash | 2 comments | 01 Jul 25 14:00 UTC | HN request time: 0.401s | source

Show context

skwb ◴[03 Jul 25 05:32 UTC] No.44451927[source]▶

It's hard to describe, but it's felt like LLMs have completely sucked the entire energy out of computer vision. Like... I know CVPR still happens and there's great research that comes out of it, but almost every single job posting in ML is about LLMs to do this and that to the detriment of computer vision.

replies(11): >>44452003 #>>44452047 #>>44452144 #>>44452268 #>>44454239 #>>44455214 #>>44455952 #>>44456130 #>>44456701 #>>44457560 #>>44460696 #

jgord ◴[03 Jul 25 06:15 UTC] No.44452144[source]▶

>>44451927 #

yeah, see my other comment.

To me its totally obvious that we will have a plethora of very valuable startups who use RL techniques to solve realworld problems in practical areas of engineering .. and I just get blank stares when I talk about this :]

Ive stopped saying AI when I mean ML or RL .. because people equate LLMs with AI.

We need better ML / RL algos for CV tasks :

  - detecting lines from pixels
  - detecting geometry in pointclouds
  - constructing 3D from stereo images, photogrammetry, 360 panoramas

These might be used by LLMs but are likely built using RL or 'classical' ML techniques, tapping into the vast parallel matmull compute we now have in GPUs / multicore CPUs, and NPUs.

replies(2): >>44452659 #>>44455693 #

1. tmilard ◴[03 Jul 25 14:54 UTC] No.44455693[source]▶

>>44452144 #

You said : "- detecting lines from pixels - detecting geometry in pointclouds - constructing 3D from stereo images, photogrammetry, 360 panoramas"

  ==> For me it is more something like :
   Source = crude video-or-photo pixels  (to) ===> Find simple many rectangle-surface  that are glued together one another.

This is, for me, how you really go easily to detecting rather complexes geometry of any room.

replies(1): >>44460256 #

2. jgord ◴[04 Jul 25 00:34 UTC] No.44460256[source]▶

>>44455693 (TP) #

I kind of did a version of what you suggest - I think I linked to a video showing plane edges auto-detected in a pointcloud sample.

Similarly I use another algo to detect pipe runs which tend to appear as half cylinders in the pointcloud, as the scanner usually sees one side, and often the other side is hidden, hard to access, up against a wall.

So, I guess my point is the devil is in the details .. and machine learning can optimize even further on good heuristics we might come up with.

Also, when you go thru a whole pointcloud, you have a lot of data to sift thru, so you want something fairly efficient, even if your using multiple GPUs do do the heavy matmull lifting.

You can think of RL as an optimization - greatly speeding up something like monte carlo tree search, by learning to guess the best solution earlier.

↑