Cool. Do you have write-up of the technical details or a tutorial on how you did this? I'm not familiar with the tech you mentioned but it'd be interesting to see how it's done and so...easily? cheaply? by non-mega-organizations?
replies(1):
It got me thinking it'd be cool to track this somehow, so I built a website! I am taking a sidewalk livestream, feeding it into a YOLO model for people tracking, then sending a frame of each detected person to Gemini 2.0 Flash, which returns structured JSON about each person's clothing and if they're holding an umbrella. I also had fun making the site look like a TV weather channel.
I showed some friends this project and someone mentioned how the legendary Tasks xkcd comic (https://xkcd.com/1425) is out of date now. If you want to check whether a photo has birds in it (or if someone is holding an umbrella), you can just ask an inexpensive vision model for JSON.