←back to thread

621 points sebg | 1 comments | | HN request time: 0s | source
Show context
randomtoast ◴[] No.43717002[source]
Why not use CephFS instead? It has been thoroughly tested in real-world scenarios and has demonstrated reliability even at petabyte scale. As an open-source solution, it can run on the fastest NVMe storage, achieving very high IOPS with 10 Gigabit or faster interconnect.

I think their "Other distributed filesystem" section does not answer this question.

replies(4): >>43717453 #>>43717925 #>>43719471 #>>43721116 #
skrtskrt ◴[] No.43721116[source]
DigitalOcean uses Ceph underneath their S3 and block volume products. When I was there they had 2 teams just managing Ceph, not even any of the control plane stuff built on top.

It is a complete bear to manage and tune at scale. And DO never greenlit offering anything based on CephFS either because it was going to be a whole other host of things to manage.

Then of course you have to fight with the maintainers (Red Hat devs) to get any improvements contributed, assuming you even have team members with the requisite C++ expertise.

replies(1): >>43721371 #
Andys ◴[] No.43721371[source]
Ceph is massively over-complicated, if I had two teams I'd probably try and write one from scratch instead.
replies(1): >>43721929 #
skrtskrt ◴[] No.43721929[source]
Most of the legitimate datacenter-scale direct Ceph alternatives unfortunately are proprietary, in part because it takes so much money and human-expertise-hours to even be able to prove out that scale, they want to recoup costs and stay ahead.

Minio is absolutely not datacenter-scale and I would not expect anything in Go to really reach that point. Garbage collection is a rough thing at such enormous scale.

I bet we'll get one in Rust eventually. Maybe from Oxide computer company? Though despite doing so much OSS, they seem to be focused around their specific server rack OS, not general-purpose solutions

replies(2): >>43722254 #>>43728084 #
steveklabnik ◴[] No.43722254[source]
> I bet we'll get one in Rust eventually. Maybe from Oxide computer company?

Crucible is our storage service: https://github.com/oxidecomputer/crucible

RFD 60, linked in the README, contains a bit of info about Ceph, which we did evaluate: https://rfd.shared.oxide.computer/rfd/0060

replies(1): >>43728146 #
sgarland ◴[] No.43728146{3}[source]
Fascinating! I thought you were using RAIDZ3 with some kind of clever wrapper (or just DRBD), but it’s much more complex than that.
replies(1): >>43730055 #
1. steveklabnik ◴[] No.43730055{4}[source]
It's not an area I personally work on, but yeah, there's a lot going on. And there will be more in the future, for example, I believe right now we ensure data integrity ourselves, but if you're running something (like Ceph) that does that on its own, you're paying for it twice. And so giving people options like that is important. It's a pretty interesting part of the space!