Most active commenters
  • yen223(4)
  • simonw(3)

←back to thread

285 points ajhit406 | 11 comments | | HN request time: 0.002s | source | bottom
Show context
stavros ◴[] No.41832728[source]
This is a really interesting design, but these kinds of smart systems always inhabit an uncanny valley for me. You need them in exactly two cases:

1. You have a really high-load system that you need to figure out some clever ways to scale.

2. You're working on a toy project for fun.

If #2, fine, use whatever you want, it's great.

If this is production, or for Work(TM), you need something proven. If you don't know you need this, you don't need it, go with a boring Postgres database and a VM or something.

If you do know you need this, then you're kind of in a bind: It's not really very mature yet, as it's pretty new, and you're probably going to hit a bunch of weird edge cases, which you probably don't really want to have to debug or live with.

So, who are these systems for, in the end? They're so niche that they can't easily mature and be used by lots of serious players, and they're too complex with too many tradeoffs to be used by 99.9% of companies.

The only people I know for sure are the target market for this sort of thing is the developers who see something shiny, build a company (or, worse, build someone else's company) on it, and then regret it pretty soon and move to something else (hopefully much more boring).

Does anyone have more insight on this? I'd love to know.

replies(8): >>41832813 #>>41832877 #>>41832980 #>>41832987 #>>41833057 #>>41833093 #>>41833218 #>>41835368 #
1. yen223 ◴[] No.41833057[source]
I almost have the opposite view:

When starting out you can get away with using a simple Postgres database. Postgres is fine for low-traffic projects with minimal latency constraints, and you probably want to spend your innovation tokens elsewhere.

But in very high-traffic Production cases with tight latency requirements, you will start to see all kinds of weird and wacky traffic patterns, that barebones Postgres won't be able to handle. It's usually in these cases where you'd need to start exploring alternatives to Postgres. It's also in these cases where you can afford to hire people to manage your special database needs.

replies(1): >>41833100 #
2. simonw ◴[] No.41833100[source]
Have you worked on any examples of projects that started on PostgreSQL and ended up needing to migrate to something specialized?
replies(1): >>41833373 #
3. yen223 ◴[] No.41833373[source]
I did, twice.

The second time, we had a reporting system that eventually stored billions of rows per day in a Postgres database. Processing times got so bad that we decided to migrate to Clickhouse, resulting in a substantial boost to query times. I maintain that we haven't exhausted all available optimisations for Postgres, but I cannot deny that the migration made sense in the long run - OLTP vs OLAP and all that.

(The first time is a funny story that I'm not quite ready to share.)

replies(3): >>41833415 #>>41833713 #>>41833821 #
4. simonw ◴[] No.41833415{3}[source]
That makes a lot of sense to me. One of my strongest hints that a non-relational data store might be a good idea is "grows by billions of rows a day".
replies(1): >>41833708 #
5. adhamsalama ◴[] No.41833708{4}[source]
Isn't Clickhouse relational?
replies(3): >>41833752 #>>41833758 #>>41833828 #
6. adhamsalama ◴[] No.41833713{3}[source]
Well, this isn't specific to Postgres, is it?

If you were storing billions of rows per day in MySQL, SQL Server, or Oracle, it still wouldn't be able to handle it, would it?

replies(1): >>41833748 #
7. yen223 ◴[] No.41833748{4}[source]
That's right. The key difference is using row-based vs column-based databases (i.e. OLTP vs OLAP). Any good database person should be cringing at the thought of using Postgres (or MySQL, Oracle, Sql Server, etc) for pulling reporting data.

That said, no regrets using Postgres there. If we started with Clickhouse the project could have not launched as quickly as it did, and that would have given us more problems.

8. crabmusket ◴[] No.41833752{5}[source]
It does allow you to query with SQL, but it's meant for OLAP workloads, not OLTP. Its internal architecture and storage is different to what you'd usually think of as a relational database, like Postgres. See https://clickhouse.com/docs/en/concepts/why-clickhouse-is-so...

The term "relational" is overloaded. Sometimes it means "you can use SQL" and sometimes it means "OLTP with data stored in an AoS btree".

(And sometimes, a pet peeve of mine, it means "data with relationships" which is based on misunderstanding the term "relation". If someone asks you if "your data is relational" they are suffering from this confusion.)

9. yen223 ◴[] No.41833758{5}[source]
Clickhouse is a SQL database, so I guess it is?

(Strictly speaking since a "relation" in the original Codd-paper sense is a table, anything with tables is relational. I don't know if that's what people mean by "relational", plus I don't know what counts as "non-relational" in that sense)

10. xarope ◴[] No.41833821{3}[source]
Right, OLTP vs OLAP are very different workloads (using the car analogy, that would be like using a ferrari to tow a trailer, and an F250 to... oh wait, an F250 can do anything!).

But seriously though, even if you use postgres, as a former DBA (DB2 and Oracle) I would have tuned the OLTP database very differently to the OLAP database, and I don't mean just indexes, but even during ETL from OLTP->OLAP you might decide to de-normalize columns on the OLAP side simply to speed up queries (OLAP databases are the sort of database you were warned about, where indexes can be 10x the data size)

11. simonw ◴[] No.41833828{5}[source]
Kind of? By "relational" there I meant "traditional relational databases like MySQL and PostgreSQL that are optimized for transactions and aren't designed for large scale analytics".