←back to thread

663 points nikisweeting | 1 comments | | HN request time: 0.203s | source

We've been pushing really hard over the last 6mo to develop this release. I'd love to hear feedback from people who've worked on big plugin systems in the past, or anyone who's tried our betas!
Show context
the_gorilla ◴[] No.41861221[source]
I don't know how anyone manages to use archivebox. I've tried it twice in the last 3 years and its site compatibility is bad, it quietly leaks everything you archive to archive.org by default, and whenever it fails on a download it stops archiving anything even after deleting and resubmitting all the jobs.

I'm sure it works for some people, but not me.

replies(1): >>41861275 #
nikisweeting ◴[] No.41861275[source]
These are legitimate gripes that have plagued specific past releases, I hear your frustration. Please keep in mind this was a solo effort of a single developer, only worked on in my spare time over the last 7 years (up until very recently).

The new v0.8 adds a BG queue specifically to deal with the issue of stalling when some sites fail. There was a system to do this in the past, but it was imperfect and mostly optimized for the docker setup where a scheduler is running `archivebox update` every few hours to retry failed URLs.

Site compability is much improved with the new BETA, but it's a perpetual cat and mouse game to fix specific sites, which is why we think the new plugin system is the way forward. It's just not sustainable for a single company (really just me right now) to maintain hundreds of workarounds for each individual site. I'm also discussing with the Webrecorder and Archive.org teams how we can to share these site-specific workarounds as cross-compatible plugins (aka "behaviors") between our various software.

> it quietly leaks everything you archive to archive.org by default

It's prominently mentioned many times (at least 4) on our homepage that this is the default, and archiving public-only sites (which are already fair game for Archive.org) is a default for good reason. Archiving private content requires several important changes and security considerations. More context: https://news.ycombinator.com/item?id=26866689

replies(2): >>41861360 #>>41861432 #
1. freedomben ◴[] No.41861360[source]
Yeah, I'm not sure whether archive.org should be defaulted to on or off (I see both sides of that one), but its existence is definitely surfaced.

I love Archive Box btw, thank you for your effort! It's filling a very important need.