How did you train this? I'm thinking there isn't reference output for live video frame to splats so supervised learning doesn't work.
Is there some temporal accumulation?
replies(1):
Supervised learning actually does work. Suppose you have four cameras. You input the three of them into the net and use the fourth as the ground truth. The live video aspect just emerges from re-running the neural net every frame.