←back to thread

128 points Brajeshwar | 2 comments | | HN request time: 0s | source
Show context
krick ◴[] No.42481217[source]
How do you backup websites? I mean, it sounds trivial, but I kinda still haven't figured out what is the way. I sometimes think that I'd like some script to automatically make a copy of every webpage I ever link in my notes (it really happens quite often that a blog I linked some years ago is no more), and maybe even replace links to that mirror of my own, but all websites I've actually backed up by now are either "old-web" that are trivial to mirror, or basically required some custom grabber to be writen by myself. If you just want to copy a webdpage, often it either has some broken CSS&JS, missing images, because it was "too shallow", or otherwise it is too deep and has a ton of tiny unnecessary files that are honestly just quite painful to keep on your filesystem as it grows. Add to that cloudaflare, Captchas, ads (that I don't see when browsing with ublock and ideally wouldn't want them in my mirrored sites as well), cookie warning splash-screens, all sorts of really simple (but still above wget's paygrade) anti-scraping measures, you get the idea.

Is there something that "just works"?

replies(3): >>42481273 #>>42481558 #>>42482585 #
wis ◴[] No.42481558[source]
For saving a webpage you have open, I use a browser extension called SingleFile, I've been using it for a while (IIRC I discovered it on HN's front page a few years ago), in my experience it "just works", works really well.

You click the "browser action" icon/button of the extension and it saves a single HTML file that looks exactly like the webpage you have open.

From its FAQ[1] on GitHub:

  # What does SingleFile do?
  SingleFile is a browser extension designed to help users save web pages as complete, self-contained files. The extension's primary function is to capture an entire web page, including its HTML, CSS, JavaScript, images, and other resources, and package them into a single HTML file.

  # I am a web archivist, is it ok to use SingleFile to archive content?
  No, SingleFile is not a tool used by professionals to archive content on the Web, especially in the academic field. Professionals prefer to rely on tools based on the WARC specification instead.
[1] https://github.com/gildas-lormeau/SingleFile/blob/master/faq...
replies(1): >>42481638 #
1. throw0101a ◴[] No.42481638[source]
> For saving a webpage you have open

There's also print-to-PDF that most OSes now have.

replies(1): >>42481906 #
2. wis ◴[] No.42481906[source]
Yeah, pretty much all browsers on all OSes have print-to-PDF/save-to-PDF, I prefer saving an HTML file over saving a PDF file for 3 reasons:

1. SingleFile allows me to save a an HTML file that looks exactly like the webpage I saved. I never used a save-to-PDF functionality in any browser that allowed me to save a PDF that looks exactly like the webpage I was saving/printing. I wish browsers implement that, somebody did that once, they patched chromium to save a web page as SVG[1], AFAIK if you can save to SVG you can also save to PDF with not much modification to the code, unfortunately the fork is not maintained anymore.

2. The HTML files that SingleFile creates are responsive (just like the webpage you had open), PDF is not responsive. I like that because it makes it easier to read the webpage I saved on my phone later, with a PDF file you saved on your desktop, you have to pinch to zoom and pan while you read it on your phone.

3. HTML-files/Webpages are accessible to screen readers and my browser's extensions work on them, extensions don't work on PDF files (they _can_ work on HTML files opened from disk, if you allow/enable it in the extension's settings).

[1] https://news.ycombinator.com/item?id=33584941