←back to thread

93 points walz | 1 comments | | HN request time: 0.205s | source

I was walking around New York last month during some light rain and noticed about half the people had umbrellas open. When the rain picked up a few minutes later, that number jumped closer to 80%.

It got me thinking it'd be cool to track this somehow, so I built a website! I am taking a sidewalk livestream, feeding it into a YOLO model for people tracking, then sending a frame of each detected person to Gemini 2.0 Flash, which returns structured JSON about each person's clothing and if they're holding an umbrella. I also had fun making the site look like a TV weather channel.

I showed some friends this project and someone mentioned how the legendary Tasks xkcd comic (https://xkcd.com/1425) is out of date now. If you want to check whether a photo has birds in it (or if someone is holding an umbrella), you can just ask an inexpensive vision model for JSON.

1. dylan604 ◴[] No.44366753[source]
Based on the position of the camera and current time of day, the sun is over powering and the camera's exposure adjustment washes out so much detail. If the system is able to determine the information it is trying to discern from that image, I'm impressed. You can't even tell what the sky looks like from this image. I'd try hooding the lens to see if you could get a better image when the sun is shining directly into the lens like this.

What would happen if someone geolocates your camera and just plants a bunch of umbrellas in the frame? Does the counter require the umbrella to be held by a human? What if the same person walks past the camera multiple times? Are they considered unique counts, or are you recognizing people and logging that?

TL;DR how robust is your system against mischievousness?