Maybe we could just publish a dump, in a standard format (WARC?), at a well-known address, and have the crawlers check there? The content could be regularly updated, and use an etag/etc so that crawlers know when its been updated.
I suspect that even some dynamic sites could essentially snapshot themselves periodically, maybe once every few hours, and put it up for download to satiate these crawlers while keeping the bulk of the serving capacity for actual humans.