Will be interested to hear of that. In the meantime, at least I learned of JShelter.
Edit:
Why not use the passage of time as the limiter? I guess it would still require JS though, unless there's some hack possible with CSS animations, like request an image with certain URL params only after an animation finishes.
This does remind me how all of these additional hoops are making web browsing slow.
Edit #2:
Thinking even more about it, time could be made a hurdle by just.. slowly serving incoming requests. No fancy timestamp signing + CSS animations or whatever trickery required.
I'm also not sure if time would make at-scale scraping as much more expensive as PoW does. Time is money, sure, but that much? Also, the UX of it I'm not sold on, but could be mitigated somewhat by doing news website style "I'm only serving the first 20% of my content initially" stuff.
So yeah, will be curious to hear the non-JS solution. The easy way out would be a browser extension, but then it's not really non-JS, just JS compartmentalized, isn't it?
Edit #3:
Turning reasoning on for a moment, this whole thing is a bit iffy.
First of all, the goal is that a website operator would be able to control the use of information they disseminate to the general public via their website, such that it won't be used specifically for AI training. In principle, this is nonsensical. The goal of sharing information with the general public (so, people) involves said information eventually traversing through a non-technological medium (air, as light), to reach a non-technological entity (a person). This means that any technological measure will be limited to before that medium, and won't be able to affect said target either. Put differently, I can rote copy your website out into a text editor, or hold up a camera with OCR and scan the screen, if scale is needed.
So in principle we're definitely hosed, but in practice you can try to hold onto the modality of "scraping for AI training" by leveraging the various technological fingerprints of such activity, which is how we get to at-scale PoW. But then this also combats any other kind of at-scale scraping, such as search engines. You could whitelist specific search engines, but then you're engaging in anti-competitive measures, since smaller third party search engines now have to magically get themselves on your list. And even if they do, they might be lying about being just a search engine, because e.g. Google may scrape your website for search, but will 100% use it for AI training then too.
So I don't really see any technological modality that would be able properly discriminate AI training purposed scraping traffic for you to use PoW or other methods against. You may decide to engage in this regardless based on statistical data, and just live with the negative aspects of your efforts, but then it's a bit iffy.
Finally, what about the energy consumption shaped elephant in the room? Using PoW for this is going basically exactly against the spirit of wanting less energy to be spent on AI and co. That said, this may not be a goal for the author.
The more I think about this, the less sensible and agreeable it is. I don't know man.