It's a good start, but I was hoping for more data. Currently, it's only around 114 GB across 2 languages (<https://www.kaggle.com/datasets/wikimedia-foundation/wikiped...>):
- English: "Size of uncompressed dataset: 79.57 GB chunked by max 2.15GB."
- French: "Size of the uncompressed dataset: 34.01 GB chunked by max 2.15GB."
In 2025, the standards for ML datasets are quite high. replies(1):