I would probably prefer to receive unmodified, plain text/md versions (with the heavy lifting done by, e.g., docling, unstructured) than LLM summaries though, since I’d rather produce my own distillations.
I would pay for that kind of thing. I think the intersection between ethical scraping and making things machine-readable is fertile ground. For a lot of companies it’s something that can be of great value, but is also non-trivial to do well and unlikely to be a core competency in-house.