On the other hand, I'm not sure exactly the details of wikipedia's api TOS. Also as it stands this website is entirely in the frontend at the moment, and I'm enjoying just scaffolding out what I can with limited a more limited set of tools to speak.
I realize now the suffix "tok" implies a crazy ML algo that is trained every single movement, click, tap, and pause you make, but I don't think I really want that.
Now, something that learns that if you like X you might like Y, even if they are disconnected. Is closer to the dystopic ad maximizing algorithm of TikTok et al.
https://en.wikipedia.org/wiki/Non-negative_matrix_factorizat...
If I had a clue how to do this (sorry, just a neuroscientist), I would probably create "communities" of pages on a network graph and weight the traversal across the graph network based on pages that the person liked (or spend X time on before).
https://www.mediawiki.org/wiki/API:Etiquette
You are basically allowed to do whatever as long as it doesn't cause an operational issue, you dont have too many requests in-flight at one time , and you put contact info in the user-agent or Api-User-Agent header. (Adding a unique api-user-agent header is probably the most important requirement, since if it does cause problems it lets operations team easily see what is happening)
I think the wiktok thing is exactly the sort of thing wikimedia folks hope people will use the api to create.
Basically, we have an unbounded counter that is gonna start breaking things. So we need to normalize it to a percentage score (by dividing it by the total favoured count across all tags), or pass it through a logarithm to bound it.
This approach only works if all content is accurately tagged, which works basically nowhere on the internet except Wikipedia.