It captures two billion pixels per second. Essentially he captures the same scene several times (presumably 921,600 times to form a full 720 picture), watching a single pixel at a time, and composite all the captures together for form frames.
I suppose that for entirely deterministic and repeatable scenes, where you also don't care too much about noise and if you have infinite time on your hands to capture 1ms of footage, then yes you can effectively visualize 2B frames per second! But not capture.
As you say: It does capture two billion pixels per second. It does watch a single pixel at a time, 921,600 times. And these pixels [each individually recorded at 2B FPS] are ultimately used to create a composition that embodies a 1280x720 video.
That's all correct.
And your summary is also correct: It definitely does not really capture 2 billion frames per second.
Unless we're severely distorting the definition of a "video frame" to also include "one image in a series of images that can be as small as one pixel," then accomplishing 2B entire frames per second is madness with today's technology.
As stated at ~3:43 in the video: "Basically, if you want to record video at 2 billion frames per second, you pretty much can't. Not at any reasonable resolution, with any reasonably-accessible consumer technology, for any remotely reasonable price. Which is why setups like this kind of cheat."
You appear to be in complete agreement with AlphaPhoenix, the presenter of this very finely-produced video.
What is your definition of "video frame" if not this?
> that can be as small as one pixel,"
Why would this be a criteria on the images? If it is, what is the minimum resolution to count as a video frame? Must I have at least two pixels for some reason? Four so that I have a grid? These seem like weird constraints to try and attach to the definition when they don't enable anything that the 1x1 camera doesn't - nor are the meaningfully harder to build devices that capture.
I agree the final result presented to the viewer is a composite... but it seems to me that it's a composite of a million videos.
If I were to agree with this, then would you be willing to agree that the single-pixel ambient light sensor adorning many pocket supercomputers is a camera?
And that recording a series of samples from this sensor would result in a video?
It takes >900k pixels to shoot one frame of that (amazing) video, and requisitioning each of those pixels required physically moving a mirror along X and Y to align the single-pixel camera properly.
There isn't really a shutter at all, whether mechanical or electrical. And from my understanding, a "rolling shutter" usually refers to things like reading out a CCD array or similar, or maybe some mechanical aspect of a film camera.
But this isn't an array of anything. It's just a pixel, and some very clever work with motors, lenses, and mirrors.
(Up next: Someone will show up to tell me that an array of 1 item is still an array. yawn)