I'm sure it works for some people, but not me.
I'm sure it works for some people, but not me.
The new v0.8 adds a BG queue specifically to deal with the issue of stalling when some sites fail. There was a system to do this in the past, but it was imperfect and mostly optimized for the docker setup where a scheduler is running `archivebox update` every few hours to retry failed URLs.
Site compability is much improved with the new BETA, but it's a perpetual cat and mouse game to fix specific sites, which is why we think the new plugin system is the way forward. It's just not sustainable for a single company (really just me right now) to maintain hundreds of workarounds for each individual site. I'm also discussing with the Webrecorder and Archive.org teams how we can to share these site-specific workarounds as cross-compatible plugins (aka "behaviors") between our various software.
> it quietly leaks everything you archive to archive.org by default
It's prominently mentioned many times (at least 4) on our homepage that this is the default, and archiving public-only sites (which are already fair game for Archive.org) is a default for good reason. Archiving private content requires several important changes and security considerations. More context: https://news.ycombinator.com/item?id=26866689
Sending everything to archive.org is bad default value and it erodes a certain level of trust in the project. Requiring "several important changes and security considerations" just makes a non-starter. The default settings should be "safe" for the default user, because as you mentioned in that post, 90% of users are never going to change them. Users should be able to run it locally and archive data without worrying about security issues, unless you only want experts to be able to use your software.
Also a contradiction between your statement and your blogpost, someone saving their photos isn't going to be want to worry about whether they configured your tool correctly or leaking all the group logs or grandma's photos.
>It's prominently mentioned many times (at least 4) on our homepage that this is the default, and archiving public-only sites (which are already fair game for Archive.org) is a default for good reason. Archiving private content requires several important changes and security considerations. More context
> Who cares about saving stuff?
> All of us have content that we care about, that we want to see preserved, but privately:
> families might want to preserve their photo albums off Facebook, Flickr, Instagram
> individuals might want to save their bookmarks, social feeds, or chats from Signal/Discord
> companies might want to save their internal documents, old sites, competitor analyses, etc.
I want the project to do well but it really needs to be secure by default.