How to Get Foreign Keys Horribly Wrong

(hakibenita.com)

63 points Bogdanp | 1 comments | 15 Jul 25 11:36 UTC | HN request time: 0.206s | source

Show context

aidos ◴[18 Jul 25 20:29 UTC] No.44609465[source]▶

I’ve done a lot of interviewing and I’ve discovered that many devs (even experienced ones) don’t understand the difference between indexes and foreign keys.

My assumption is that people have used orms that automatically add the index for you when you create a relationship so they just conflate them all. Often they’ll say that a foreign key is needed to improve the performance and when you dig into it, their mental model is all wrong. The sense they have is that the other table gets some sort of relationship array structure to make lookups fast.

It’s an interesting phenomenon of the abstraction.

Don’t get me wrong, I love sqlalchemy and alembic but probably because I understand what’s happening underneath so I know the right way to hold it so things are efficient and migrations are safe.

replies(5): >>44609923 #>>44610030 #>>44610036 #>>44611356 #>>44611914 #

alexjplant ◴[19 Jul 25 02:05 UTC] No.44611914[source]▶

>>44609465 #

During a work meeting I once suggested using a non-PK column in a Postgres database for a foreign key. A coworker confidently said that we shouldn't because joins would be slow. I pointed out that we could create an index on that column and they rebutted by claiming that PKs created some kind of "special" index. I didn't want to burn goodwill and so didn't push it further but it always struck me as silly.

Depending upon the database storage engine, available memory, and table size I could see there being _some_ performance hit if only PKs are used for statistics but I'd think that modern RDBMSes are smart enough to cache appropriately. Am I missing something?

replies(1): >>44612590 #

quectophoton ◴[19 Jul 25 04:26 UTC] No.44612590[source]▶

>>44611914 #

> and they rebutted by claiming that PKs created some kind of "special" index

Maybe they were thinking about something like the "clustered indexes" from SQL Server, and mistakenly thought PostgreSQL also worked like that:

> "When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table doesn't already exist and you don't specify a unique nonclustered index." [1]

> "Clustered indexes sort and store the data rows in the table or view based on their key values." [2]

So I'm guessing you could squeeze some extra performance for certain access patterns, maybe? I have not worked at any place where I had needed to worry about low level details like this, though, so obligatory disclaimer to take this comment with a grain of salt due to my lack of first-hand experience.

[1]: https://learn.microsoft.com/en-us/sql/relational-databases/i...

[2]: https://learn.microsoft.com/en-us/sql/relational-databases/i...

replies(1): >>44615766 #

1. whatevaa ◴[19 Jul 25 14:30 UTC] No.44615766[source]▶

>>44612590 #

In index oriented tables, primary keys are special. Table is organized by primary key and secondary indexes point to primary.

In postgres, primary key is basically unique index with some special semantics.

↑