←back to thread

132 points fractalbits | 2 comments | | HN request time: 0s | source
Show context
kburman ◴[] No.46255028[source]
I feel like this product is optimizing for an anti-pattern.

The blog argues that AI workloads are bottlenecked by latency because of 'millions of small files.' But if you are training on millions of loose 4KB objects directly from network storage, your data pipeline is the problem, not the storage layer.

Data Formats: Standard practice is to use formats like WebDataset, Parquet, or TFRecord to chunk small files into large, sequential blobs. This negates the need for high-IOPS metadata operations and makes standard S3 throughput the only metric that matters (which is already plentiful).

Caching: Most high-performance training jobs hydrate local NVMe scratch space on the GPU nodes. S3 is just the cold source of truth. We don't need sub-millisecond access to the source of truth, we need it at the edge (local disk/RAM), which is handled by the data loader pre-fetching.

It seems like they are building a complex distributed system to solve a problem that is better solved by tar -cvf

replies(6): >>46255366 #>>46255422 #>>46255678 #>>46255722 #>>46255754 #>>46255888 #
fulafel ◴[] No.46255722[source]
You can do app optimizations to work with object databases that are slow for small objects, or you can have a fast object database - doesn't seem that black and white. If you can build a fast object database that is robust and solves that problem well, it's (hopefully) a non leaky abstraction that can warrant some complexity inside.

The tar -cvf is a good analogy though, are you working with a virtual tape drive or a virtual SSD.

replies(1): >>46257344 #
1. kburman ◴[] No.46257344[source]
Expecting the storage layer to fix an inefficient I/O pattern (millions of tiny network requests) is optimizing the wrong part of the stack.

> are you working with a virtual tape drive or a virtual SSD.

Treating a networked object store like a local SSD ignores the Fallacies of Distributed Computing. You cannot engineer away the speed of light or the TCP stack.

replies(1): >>46261350 #
2. fulafel ◴[] No.46261350[source]
SSD (over nvme) and TCP (over 100gbe) both exhibit low tens of microseconds of latency as the low bound. This is ignoring redundancy for both of course, but the cost of that should also be similar between the two.

If the storage is farther away, then you'll go slower of course. But since the article is comparing to EFS and S3 Express, it's fitting to talk about a nearby scenarios I think. And the point of the article was that S3 Express was more problematic for cost than small-object performance reasons.