I had a lot of problems when using it with a dataset of many jpg Files.
The indexing for every dvc status took many minutes to check every file. Caching did not work.
Sadly I had to let go of it.
replies(1):
The indexing for every dvc status took many minutes to check every file. Caching did not work.
Sadly I had to let go of it.
If caching is not needed and streaming required, we've created a sister tool DataChain. It's even supports WebDataset and can stream from tar archives and filter images by metadata.
WebDataset example: https://github.com/iterative/datachain/blob/main/examples/mu...