We've been working on this challenge in the satellite domain with https://earthgpt.app. It’s a subset of what Fei-Fei is describing, but comes with its own unique issues like handling multi-resolution sensors and imagery with hundreds of spectral bands. Think of it as computer vision, but in n-dimensions.
Happy to answer questions if you're curious. PS. still in early beta, so please be gentle!
replies(2):