This has huge implications for everything from competitive pricing, to understanding store layouts, to creating your own grocery store inflation monitor. Just subtly take a video and process it.
And the models have only gotten better.
Even smaller stores have been monitoring their competitors since a long time.
> your own grocery store inflation monitor
You could also check your itemized bill.
I've tried bouncing this off some Google employees and the general vibes I got back from them is that Google is very good at running stuff like this at a scale that drives down the cost for individual queries, so they seemed confident that these prices were not a loss leader strategy.
I don't know if I can believe that though. It's just SO cheap!
Sure you could do this, and it would work, but you'd spend about 100000x what I do with a $10 Hetzner VPS and a small amount of proxy bandwidth.
"Analyze this screenshot and tells me if everything seems legit"
And it could the difference between a legit screenshot and a phishing attempt. Without me telling it which site name was legit or not.And it was pretty detailed: "There's an 'l' in interactlvebrokers.ie" that is made to look like an 'i'". Or something like that.
We're already getting and we're going to get lots* of shiny new helper tools.
But only for the things you buy.
Note too, some big retail stores actually have a "license" or "contract" for customers hiding behind the service desk, and often, video recording is one of the things they forbid. It's not "illegal" to do so, but if they catch you, and insist you leave, and you refuse, now you're trespassing, and that has legal consequences.
It does process a version of the raw video but it can run that faster than the default video playback rate.
That is quite a bit of detail here: https://ai.google.dev/gemini-api/docs/vision?lang=python#pro...
"The File API service extracts image frames from videos at 1 frame per second (FPS) and audio at 1Kbps, single channel, adding timestamps every second."
A common task is to take photos of shelf's and the products / pricing.
Its framed as "make sure our employees are doing it correct" but based on the strict image requirements (needed for later computer processing) I have the feeling it is actually a competitor trying to get shelf and price info. It feels a little cloak and dagger.
But it just hurts my programmer soul that it is somehow more effective to record an app, that first renders (semi-)structured text into pixels, then record those millions of pixels into a video, send that over the network to a cloud and run it through a neural network with billions of parameters, than it is to access the 1 kilobyte of text that's already loaded into the memory and process locally.
And yes there are workflows to do that as demonstrated by other comments, but it's a lot of effort that will be constantly thwarted by apps changing their data-structures or obfuscating whatever structure they have or just because software is so layered and complicated that it's hard to get to the data.
>I could have clicked through the emails and copied out the data manually one at a time. This is error prone and kind of boring. For twelve emails it would have been OK, but for a hundred it would have been a real pain.
This seems contradictory to me.
Navigating to and then copying and pasting specific text out of twelve different emails (and then removing commas and dollar signs and reformatting dates as YYYY-MM-DD) is still a whole lot more work than watching a 35s video to check that it did the tedious data entry for you.
For the 100 email version I'd do more of a spot check, depending on how high the stakes were.