Ok, ill tell you some of the funny parts... I've kinda enjoyed keeping the "secret" for long enough.
My aggregator is actually a giant grease monkey script. Grease monkey can just do xml requests cross domain. HTML is great for creating links that open in the browser and I don't want to be alt tapping from one window to the other. This also makes me uninterested in the <description> element which consumes a huge amount of space. I'm really discarding everything except [the pubDate, the headline, a link to the content chopped into the [sub] domain, and the rest of the path] It's a giant date sorted array of arrays. New entries are pushed into a temp array from which they are pop'ed and spliced into the right spot in the big one. I usually keep 5000 entries (2000 is enough tho) so nr 5001 is removed.
It stays small data that way! I can periodically post it to some simple php script that turns it into a static html document. If the oldest result is newer than the newest from the previous dump it dumps it again.
Every 20 seconds the table on the page is replaced with the updated results. I had it in real time but it moves to much to read and click on things. No need to preserve any dom, dom manipulation is slow, just feed a whole new table into the page as a string.
XHR requests must be async which is a terrible technology. You don't know how many responses you are going to get per second. It has some relationship with the number of requests you make but making a lot of requests there is no way to avoid receiving a lot simultaneously which, if you try do anything with the response text, freezes the browser. There is no time to do more than push them into an array.
Then I use a setInterval to parse them, how many depends on how far behind the parser is. setInterval delays if the browser is busy, parsing can be far behind. If it is to far behind the number of requests per second is dialed down.
Probably the funniest part is the parser
It starts with a regex exec in a do{}while() loop. It looks for things that look like pubDates, it compares up to 3 time stamps with the oldest pubDate in my result set. Most feeds never make it beyond that point.
Then it tries to parse the feed by the rules. It looks for the string <item>, if it finds that it looks for <title> etc
Lots of feeds are hilariously broken and invalid so I have an arsenal of alternative approaches in the correct order of likelyhood.
The funniest moment was when someone deleted their feed and redirected it to the website front page.... but I never noticed it. It tried to parse the "feed" in all ways it could and as a last resort looked for <a href=""> elements. Because it also couldn't find a pubDate of any kind, nothing that looked like a date, it took the /2023/01/03 part of the url and made a time stamp from that, inserted it into the result set and happily continued.
Feeds that are slow get suspended, lvl 1 is 24 hours, level 5 is a week, 5 weeks is the limit. I might try them again after a few more months.
In monitor mode each function has it's own tiny log window. The log() function is normally empty, modern js skips it efficiently regardless what kind of complex params are set.
At times I have benchmarking things inthere so that it choses the fastest approach automatically.
It can consume opml (any number pretty much) but I prefer using flat lists of feed urls.
Apart from the (optional) dumps the backend is just static files.
Because it runs in the browser it can also use credentials. It mixes in my gmail feed too. https://mail.google.com/mail/feed/atom
Not really something I want to share with the world.
I will share this http://salamisushi.go-here.nl
This is a greassemonkey script, it detects feeds linked in the web pages you visit, displays an edit box with the feeds it found and it lets you export the list as an opml file.
The web pages you visit are actually amazingly interesting to you personally. Much more so than one would expect. Unsubbing the garbage goes much faster than finding interesting feeds manually one by one. If there is a lot of garbage it is because you've visited to many garbage websites. It's funny to reflect on.
Thanks for your time