←back to thread

132 points fractalbits | 2 comments | | HN request time: 0.436s | source
Show context
pyrolistical ◴[] No.46255953[source]
They claim AI workflows require:

1. Small Objects at Scale

2. Latency Sensitivity

3. The Need for Directories

I’m skeptical on the last one. They talk about rename performance as being the issue.

I think what they mean is if you use path as the object key, if you rename a directory in the middle of a path, you need rename every object key that uses it.

But to me that is just a poor usage of an object store. You should never “rename” object keys.

Consider how git does it. If you rename a directory and diff it, the underlying object store didn’t rename any key. In fact all the files in the object stores are unchanged. Only the tree file changed, which maps paths to file hashes.

While renames would get faster that way, it would increase latency to do a path to object key look up.

I would like to see how fundamental the requirement to have directories are to AI workflows. I suspect it’s human “but I’m used to it” requirement

replies(1): >>46255984 #
1. munchbunny ◴[] No.46255984[source]
> I would like to see how fundamental the requirement to have directories are to AI workflows.

In my experience, it's not that directories are inherently important, it's that an organization mechanism is, in the service of a few key problems:

1. Privacy and data handling requirements

2. Versioning

3. Partitioning

4. Probably some others I'm forgetting

Hierarchical storage is a useful all-purpose tool for these things.

replies(1): >>46257097 #
2. pyrolistical ◴[] No.46257097[source]
How many of those problems are not solved by independent (s3 concept of) buckets?