You can do it using the more classic technique of photogrammetry. There are commercial products used by real estate salesmen to produce high quality "games" where you walk around inside a house, but they're more like Google Streetview where you swoosh between points where a 360 degree photo was taken. All those things will be more faithful than neurally generating next frames based on previous frames and control input.