We built another object storage

1. pyrolistical ◴[13 Dec 25 16:54 UTC] No.46255953[source]▶

They claim AI workflows require:

1. Small Objects at Scale

2. Latency Sensitivity

3. The Need for Directories

I’m skeptical on the last one. They talk about rename performance as being the issue.

I think what they mean is if you use path as the object key, if you rename a directory in the middle of a path, you need rename every object key that uses it.

But to me that is just a poor usage of an object store. You should never “rename” object keys.

Consider how git does it. If you rename a directory and diff it, the underlying object store didn’t rename any key. In fact all the files in the object stores are unchanged. Only the tree file changed, which maps paths to file hashes.

While renames would get faster that way, it would increase latency to do a path to object key look up.

I would like to see how fundamental the requirement to have directories are to AI workflows. I suspect it’s human “but I’m used to it” requirement

replies(1): >>46255984 #

2. munchbunny ◴[13 Dec 25 16:59 UTC] No.46255984[source]▶

>>46255953 (TP) #

> I would like to see how fundamental the requirement to have directories are to AI workflows.

In my experience, it's not that directories are inherently important, it's that an organization mechanism is, in the service of a few key problems:

1. Privacy and data handling requirements

2. Versioning

3. Partitioning

4. Probably some others I'm forgetting

Hierarchical storage is a useful all-purpose tool for these things.

replies(1): >>46257097 #

3. pyrolistical ◴[13 Dec 25 19:20 UTC] No.46257097[source]▶

>>46255984 #

How many of those problems are not solved by independent (s3 concept of) buckets?