Splatt3R: Zero-Shot Gaussian Splatting from Uncalibrated Image Pairs

1. S0y ◴[27 Aug 24 15:10 UTC] No.41368437[source]▶

This is really awesome. A question for someone who knows more about this: How much harder would it be to make this work using any number of photos? I'm assuming this is the end goal for a model like this.

Imagine being able to create an accurate enough 3D rendering of any interior with just a bunch of snapshots anyone can take with their phone.

replies(2): >>41368608 #>>41368627 #

2. dagmx ◴[27 Aug 24 15:25 UTC] No.41368608[source]▶

>>41368437 (TP) #

That’s already how Gaussian splats work.

They’re novelty of splattr (though I contest that they’re the first to do so) is that they need fewer images than usual.

replies(2): >>41368661 #>>41368790 #

3. Arkanum ◴[27 Aug 24 15:27 UTC] No.41368627[source]▶

>>41368437 (TP) #

Probably not much harder, but you wouldn't get the same massive jump in quality that you get going from 1 image to 2. NeRF/Gaussian Splatting in general is what you're describing, but from the looks of it, this just does it in a single forward pass rather than optimising the gaussian/network weights.

4. Arkanum ◴[27 Aug 24 15:30 UTC] No.41368661[source]▶

>>41368608 #

I think the novelty is that they don't have to optimise the splats at all, they're directly predicted in a single forward pass.

replies(1): >>41372531 #

5. GaggiX ◴[27 Aug 24 15:41 UTC] No.41368790[source]▶

>>41368608 #

The novelty here is that it does work on uncalibrated images.

replies(2): >>41369005 #>>41372536 #

6. milleramp ◴[27 Aug 24 16:02 UTC] No.41369005{3}[source]▶

>>41368790 #

Not really, it is using Mast3r to determine camera poses.

7. dagmx ◴[27 Aug 24 20:21 UTC] No.41372531{3}[source]▶

>>41368661 #

That’s not really novel either imho, though google search is escaping me on the specific papers I saw at siggraph.

Imho it’s an interesting combination of technologies but not novel in an off itself.

8. dagmx ◴[27 Aug 24 20:22 UTC] No.41372536{3}[source]▶

>>41368790 #

A lot of splats systems do work on uncalibrated images so that’s not novel either. They all just do a camera solve, which arguable isn’t terrible for a stereo pair with low divergence.