←back to thread

621 points sebg | 1 comments | | HN request time: 0.203s | source
Show context
randomtoast ◴[] No.43717002[source]
Why not use CephFS instead? It has been thoroughly tested in real-world scenarios and has demonstrated reliability even at petabyte scale. As an open-source solution, it can run on the fastest NVMe storage, achieving very high IOPS with 10 Gigabit or faster interconnect.

I think their "Other distributed filesystem" section does not answer this question.

replies(4): >>43717453 #>>43717925 #>>43719471 #>>43721116 #
elashri ◴[] No.43717925[source]
CERN use CephFS with ~50PB for different applications and they are happy with it.
replies(1): >>43718236 #
dfc ◴[] No.43718236[source]
I thought they used ceph too. But I started looking around and it seems like they have switched to CernVM-FS and in house solution. I'm not sure what changed.
replies(2): >>43719123 #>>43727400 #
1. amadio ◴[] No.43727400[source]
CERN is a heavy user of ceph, with about 100PB of data across cephfs, object stores (used as backend for S3), and block storage (mostly for storage for VMs). CVMFS (https://cernvm.cern.ch/fs/) is used to distribute the software stacks used by LHC experiments across the WLCG (Worldwide LHC Computing Grid), and is back by S3 with ceph for its storage needs. Physics data, however, is stored on EOS (https://eos.web.cern.ch) and CERN just recently crossed the 1EB mark of raw disk storage managed by EOS. EOS is also used as the storage solution for CERNBox (https://cernbox.web.cern.ch/), which holds user data. Data analyses use ROOT and read the data remotely from EOS using XRootD (https://github.com/xrootd/xrootd), as EOS is itself based on XRootD. XRootD is very efficient to read data across the network compared to other solutions. It is also used by other experiments beyond high energy physics, for example by LSST in its clustered database called Qserv (https://qserv.lsst.io).