←back to thread

Are we decentralized yet?

(arewedecentralizedyet.online)
487 points Bogdanp | 2 comments | | HN request time: 0s | source
Show context
d4mi3n ◴[] No.45077410[source]
Neat! I'm not surprised at the findings here. BlueSky (for the average user) is pretty much a drop in replacement for Twitter.

Despite the smaller total numbers in Mastadon, it's great to see that the ecosystem seems to be successfully avoiding centralization like we've seen in the AT-Proto ecosystem.

I suspect that the cost of running AT proto servers/relays is prohibitive for smaller players compared to a Mastadon server selectively syndicating with a few peers, but I say this with only a vague understanding of the internals of both of these ecosystems.

replies(6): >>45077507 #>>45077986 #>>45078151 #>>45078889 #>>45079652 #>>45080382 #
danabramov ◴[] No.45077986[source]
>I suspect that the cost of running AT proto servers/relays is prohibitive for smaller players compared to a Mastadon server selectively syndicating with a few peers, but I say this with only a vague understanding of the internals of both of these ecosystems.

This isn't quite right. ATProto has a completely different "shape" so it's hard to make apples-to-apples comparison.

Roughly speaking, you can think of Mastodon as a bunch of little independently hosted copies of Twitter that "email" (loosely speaking) each other to propagate information that isn't on your server. So it's cheap to run a server for a bunch of friends but it's cut off from what's happening in the world. Your identity is tied to your server (that's your webapp), and when you want to follow someone on another server, your server essentially asks that other server to send stuff to yours. This means that by default your view of the network is extremely fragmented — replies, threads, like counts are all desynchronized and partial[1] depending on which server you're looking from and which information is being forwarded to it.

ATProto, on the other hand, is designed with a goal of actually being competitive with centralized services. This means that it's partitioned differently – it's not "many Twitters talking to each other" which is Mastodon's model. Instead, in ATProto, there is a separation of concerns: you have swappable hosting (your hosting is the source of truth for your data like posts, likes, follows, etc) and you have applications (which aggregate data from the former). This might remind you of traditional web: it's like every social media user posts JSON to "their own website" (i.e. hosting) while apps aggregate all that data, similar to how Google Reader might aggregate RSS. As a result, in ATProto, the default behavior is that everyone operates with a shared view of the world — you always see all replies, all comments, all likes are counted, etc. It's not partial by default.

With this difference in mind, "decentralizing" ATProto is sort of multidimensional. In Mastodon, the only primitive is an "instance" — i.e. an entire Twitter-like webapp you can host for your users. But in ATProto, there are multiple decentralized primitives:

- PDS (personal data hosting) is application-agnostic data store. Bluesky's implementation is open source (it uses sqlite database per user). There are also alternative implementations for the same protocol. Bluesky the company does operate the largest ones. However, running a PDS for yourself is extremely cheap (like maybe $1/mo?). It's basically just a structured KV JSON storage organized as a Merkle tree. A bit like Git hosting.

- AppViews are actual "application backends". Bluesky operates the bsky.app appview, i.e. what people know as the Bluesky app. Importantly, in ATProto, there is no reason for everyone to run their own AppView. You can run one (and it costs about $300/mo to run a Bluesky AppView ingesting all data currently on the network in real time if you want to do that). Of course, if you were happy with tradeoffs chosen by Mastodon (partial view of the network, you only see what your servers' users follow), you could run that for a lot cheaper — so that's why I'm saying it's not apples-to-apples. ATProto makes it easy to have an actually cohesive experience on the network but the costs are usually being compared with fragmented experience of Mastodon. ATProto can scale down to Mastodon-like UX (with Mastodon-like costs) but it's just not very appealing when you can have the real thing.

- Relays are things "in between" PDS's and AppViews. Essentially a Relay is just an optimization to avoid many-to-many connections between AppViews and PDS's. A Relay just rebroadcasts updates from all PDS's as a single stream (that AppViews can subscribe to). Running a Relay used to be expensive but it got a lot cheaper since "Sync 1.1" (when a change in protocol allowed Relays to be non-archiving). Now it costs about $30/mo to run a Relay.

So all in all, running PDSs and Relays is cheap. Running full AppViews is more expensive but there's simply no equivalent to that in the Mastodon world because Mastodon is always fragmented[1]. And running a partial AppView (comparable to Mastodon behavior) should be much, much cheaper — but also not very appealing so I don't know anyone who's actually doing that. (It would also require adding a bit of code to filter out the stuff you don't care about.)

[1] Mastodon is adding a workaround for this with on-demand fetching, see https://news.ycombinator.com/item?id=45078133 for my questions about that; in any case, this is limited by what you can do on-demand in a pull-based decentralized system.

replies(4): >>45078344 #>>45081740 #>>45081898 #>>45087265 #
ttiurani ◴[] No.45081898[source]
> You can run one (and it costs about $300/mo to run a Bluesky AppView ingesting all data currently on the network in real time if you want to do that).

A clarifying question: the blog post [0] I found about zeppelin.social which I think is a full AppView, the author said this:

"The cost to run this is about US $200/mo, primarily due to the 16 terabytes of storage it currrently uses"

Last I heard the amount of storage was just a couple of terabytes so the growth seems to be very fast.

If and when the primary cost is the storage, IMO the crucial question is: what's the expected future cost of running community AppViews?

Because unless storage cost drops as fast as the BlueSky data grows (unlikely?), to me this architecture looks like it will very soon kick out smaller players and leave only BlueSky with enough money to keep the AppView running.

[0] https://whtwnd.com/futur.blue/3ls7sbvpsqc2w

replies(1): >>45081935 #
danabramov ◴[] No.45081935[source]
I can’t speak to how fast it grows and what it was, but I mean — if what you want is to keep the entire data of the network (similar to having all tweets on Twitter) ready to be queried then you have to store them. That’s just unavoidable in any technological solution. Alternatively you could hydrate and query posts on-demand from their sources (PDS), and people have done that as an experiment, but you need at least some aggregation to happen somewhere (for reconstructing reply lists or like lists etc). If more collectively-run network caches are available, this becomes more feasible without storing everything yourself.

In any case, if you’re okay with a partial snapshot of the network (eg all posts during some window or even more partial) then you can arbitrarily narrow that down. In Mastodon, having a “full” archive is downright impossible which is why we’re not talking about the same with regards to Mastodon. Whereas ATProto makes it possible, with the cost being the floor of what you’d expect the cost for storing data to be. How could it be better?

replies(2): >>45082002 #>>45082013 #
1. ttiurani ◴[] No.45082002[source]
> if what you want is to keep the entire data of the network (similar to having all tweets on Twitter) ready to be queried then you have to store them.

They need to be stored, but do they technically have to be stored by just one AppView? I get that it's a 100x easier to implement it like that, but I don't think a distributed search would've been technically impossible (although, granted, necessarily it would have had worse UX).

Choosing this feature and then implementing it like they did was a technical choice. Technical choices have consequences and this, I think, was the one which will prevent BlueSky from reaching any meaningful decentralization.

And saying "you can create an inferior UX with affordable costs" is not a real answer. Any meaningful decentralization IMO can only happen if it's affordable to create feature identical nodes. That can only happen if you refuse to implement features in ways that need centralization to scale.

replies(1): >>45083373 #
2. danabramov ◴[] No.45083373[source]
I’m not sure what choice you’re referring to here. Yes, simply loading data from the database is the most straightforward solution, and that’s what Bluesky itself did for its AppView. That’s kind of the default model in general in web development — and has nothing to do with decentralization. If you were running a centralized Twitter, storing the amount that Twitter stores would cost you exactly the same.

On the contrary, ATProto adds flexibility here. There are community-run projects like https://constellation.microcosm.blue/ that let small application builders avoid that burden. Of course you don’t want to overwhelm those by building a massive app on top. But the point is that ATProto starts with equivalent baseline to what you’d pay running a centralized service, and then gives you room to play with distribution of costs, potentially going all the way down to directly querying PDS’s on-demand or something in between like community-maintained caches or even potential third-party app-agnostic aggregation services. Eg you could imagine AWS, Vercel or Cloudflare building “app platforms” in five years that let you cheaply query shared data.

As for creating “identical” nodes, I think you hit the nail on the head — that’s not what ATProto aims to do. The insight is that it’s not useful or feasible for everyone to run their own copy of Twitter. But that it’s possible for everyone with “proportional interest” to run a “proportionally complete” part, with some of the costs being amortizable and poolable across many users and apps (thanks to shared infrastructure) and always individually replaceable (to avoid lock-in). This is strictly better than centralized.