There may be an interesting opportunity to gather data on the accuracy of guesses per image. You could use something like Google analytics, but simple server-side logging is more private and keeps the page light.
The question could be: What images are most often mistaken? What characteristics do they share? Knowing the highest false negative images would be really valuable people to know what not to ignore.