That's baloney. The old ML adage "there's no data like more data" is as old as mankind itself.
That's baloney. The old ML adage "there's no data like more data" is as old as mankind itself.
Really? Did it take at least an entire rack to store?
my storage hierarchy goes 1) 1 storage drive 2) 1 server maxed out with the biggest storage drives available 3) 1 rack filled with servers from 2 4) 1 data center filled with racks from 3
It's a terrible measurement because it's an irrelevant detail about how their data is stored that no one actually knows if your data is being stored in a proprietary cloud except for people that work there on that team.
So while someone could say they used a 10 TiB data set, or 10T parameters, how many "racks" of AWS S3 that is, is not known outside of Amazon.
And whether your data can fit on a single server, single rack, or many racks will drastically affect how you design the infrastructure.