←back to thread

1298 points jgrahamc | 2 comments | | HN request time: 0.159s | source
Show context
jgrahamc ◴[] No.22883548[source]
I posted this hours ago and then stepped away. The story captures so much about the Lee I knew so well. I'll add one piece of praise for Lee's early architecture of Cloudflare.

Everything was controlled by a single Postgres database that made very heavy use of stored procedures, that called other procedures, that called others. It was one giant program inside the database. It took me a while to comprehend what he'd done but it was really great. The database ran everything and all those functions made sure that audit logs were kept, that the calls were allowed for the user ID being passed in, and some of these procedures made external calls to APIs including getting things like SSL certificates.

It was a magnificent monolith inside a database.

I worked on the periphery of the database (it was truly Lee's domain) and he'd tell me what output to expect or API to create and I'd code to his spec. and we'd just hook it up.

If any single artefact represents what he did at Cloudflare, it's that database. And he used to code it on a laptop we called "The Beast" because it was so crazily heavy and overloaded with memory etc. that he'd carry around a mini, test Cloudflare wherever he went.

replies(2): >>22883751 #>>22883786 #
mynameishere ◴[] No.22883751[source]
Working with stored procedures is just the pits. I wonder why you would describe that as "great". Maybe it's optimized, but so would be a giant executable written in assembly.
replies(4): >>22883916 #>>22883978 #>>22884300 #>>22884664 #
crazygringo ◴[] No.22884664[source]
As long as you have the necessary tooling for source control, versioning, code search, etc. -- why is it the pits?

I mean, if you stick to a rigorous procedure of automatically exporting your stored procedures to files and committing them to git with messages... how is that different from any other development?

Serious question -- I've done tons of database work but only a little bit with stored procedures, so wondering if I'm missing a crucial detail here.

replies(4): >>22886392 #>>22886876 #>>22887472 #>>22892698 #
1. danielbarla ◴[] No.22887472[source]
It's a completely valid question, so I'll try to give a serious answer, from my perspective. Firstly, I'd like to say that SPs can be quite awesome, for several of the reasons already stated. It is normally one of the few ways one can truly enforce integrity, access controls, and logging at the DB level. (Also, most of my experience is from an MS SQL and C# background, so some of this may not translate over to PostgreSQL quite so well)

1. Verbosity and fitness for purpose: while there is a kind of beauty in pure set-based SQL querying, stored procedures by their nature tend to be a procedural paradigm, and in my opinion, they are not particularly well suited to it. They are insanely verbose by today's standards, and doing anything typically requires several times more code than the equivalent in most modern languages (your mileage may vary with different database engines!). This results in large, monolithic SPs, which are difficult to read, understand, and test. The "standard library" is generally far less extensive than mainstream languages, leading to sub-standard, verbose, slow implementations (string concatenations and manipulations are a good example in MS SQL).

2. Code navigation, dependency management: the inverse of point 1 - in general, the tooling around databases for code searching, navigation, etc, is quite woeful in comparison to non database languages. In MS SQL, dependency management is so unreliable, that the official tooling misses several classes of dependencies, just depending on the type of syntax you happened to use. In your question, you start by saying that if you have the necessary tooling already obtained, it's all good. In my experience, rarely are even "barely good enough" tools in place, let alone anything close to what exists in other environments.

3. Unit testing: since databases tend to be stateful, the very concept of unit testing is a bit of an alien one. People don't tend to write SPs with unit tests in mind, and at least in the MS SQL world, the few "unit testing" frameworks that did exist tended to be incredibly painful to use - to the point that they were generally not worth it. Brittle, expensive integration tests are not the same.

4. Execution plan caching: the way databases handle caching of SP execution plans leaves much to be desired - generally the first set of parameters that an SP gets called with end up determining the execution plan for all future calls. Even in well managed code bases, this ends up being an issue, occasionally needing a DBA to manually go in, identify a misbehaving execution plan, and evict it. Vary by param, and no-cache options do not fully solve this. While the need for such mechanisms is obvious, it is surprising that these have been a pain for 20+ years. Now, execution plan caching also happens for normal SQL too, so my point is rather that the culture around SPs tends to have an unreasonable fear of dynamic SQL, which would be far more appropriate _where the query actually is of a dynamic nature!_

I can imagine people replying "but clearly, you're doing things that you shouldn't be - that's not what SPs are for!". And I agree, in a large part that is my point too. But then, what exactly should SPs be used for?

replies(1): >>22888975 #
2. crazygringo ◴[] No.22888975[source]
Thank you so much for writing that up!