All digitized books ever written/encoded compress to a few TB. The public web is ~50TB. I think a usable zip of all english electronic text publicly available would be on O(100TB). So we're at about 1% of that in model size, and we're in a diminishing-returns area of training -- ie., going to >1% has not yielded improvements (cf. gpt4.5 vs 4o).
This is why compute spend is moving to inference time with "reasoning" models. It's likely we're close to diminshing returns on inference-time compute now too, hence agents whereby (mostly,) deterministic tools are supplementing information /capability into the system.
I think to get any more value out of this model class, we'll be looking at domain-specific specialisation beyond instruction fine-tuning.
I'd guess targeting 1TB inference-time VRAM would be a reasonable medium-term target for high quality open source models -- that's within the reach of most SMEs today. That's about 250bn params.