Most active commenters
  • aidos(4)

←back to thread

60 points Bogdanp | 13 comments | | HN request time: 0.658s | source | bottom
1. aidos ◴[] No.44609465[source]
I’ve done a lot of interviewing and I’ve discovered that many devs (even experienced ones) don’t understand the difference between indexes and foreign keys.

My assumption is that people have used orms that automatically add the index for you when you create a relationship so they just conflate them all. Often they’ll say that a foreign key is needed to improve the performance and when you dig into it, their mental model is all wrong. The sense they have is that the other table gets some sort of relationship array structure to make lookups fast.

It’s an interesting phenomenon of the abstraction.

Don’t get me wrong, I love sqlalchemy and alembic but probably because I understand what’s happening underneath so I know the right way to hold it so things are efficient and migrations are safe.

replies(5): >>44609923 #>>44610030 #>>44610036 #>>44611356 #>>44611914 #
2. whyowhy3484939 ◴[] No.44609923[source]
Very strange if you ask me and disturbing. I don't know if I'd let such a dev touch a database. Of course nowadays we just vibe code and YOLO everything, but still. This is making me feel old.
3. bevr1337 ◴[] No.44610030[source]
> their mental model is all wrong.

Is it? In Postgres, all FK references must be to a column with a PK or unique constraint or part of another index. Additionally, Postgres and Maria (maybe all SQL?) automatically create indexes for PKs and unique constraints. There's a high likelihood that a foreign key is already indexed _in the other table_.

Generally, I agree with your statement. Adding a FK won't magically improve performance or create useful indices. But, the presence of a FK or refactoring to support a FK does (tangentially) point back to that index.

replies(3): >>44610282 #>>44610714 #>>44610850 #
4. Fishkins ◴[] No.44610036[source]
Huh, that's interesting. Mixing indexes and FKs is a major conceptual error.

FWIW, I've also asked everyone I've interviewed in the past decade about indexes and FKs. Most folks I've talked to seem to understand FKs. They're often fuzzier on the details of indexes, but I don't recall anyone conflating the two.

replies(1): >>44610888 #
5. ak39 ◴[] No.44610282[source]
By definition, a FK has to reference a PK in the “parent”.
replies(1): >>44610905 #
6. UltraSane ◴[] No.44610714[source]
Yes. Not understanding the difference means you really don't understand the relational model. It would be like a network engineer not understanding the difference between IP and MAC addresses.
7. aidos ◴[] No.44610850[source]
I wasn’t totally clear on my original statement. As you point out, the referenced columns in the referenced table need to have a unique constraint and that’s done with a unique index. My understanding is that this ensures there’s no ambiguity as to which row is referenced and allows for efficient enforcement of the FK constraint.

Django automatically creates an index on the referencing table to ensure that joins are fast. The fact that you have the relationship in the ORM means that’s how you’re likely to access the data so it makes perfect sense.

The mental model mismatch I’ve seen is that people appear to think of the relationship as being on the parent object “pointing” at the child table.

replies(1): >>44612391 #
8. aidos ◴[] No.44610888[source]
I guess it depends on how much time you’ve spent in a relational db. For people who mostly interact with them via an orm, I can see where the confusion comes from.
9. aidos ◴[] No.44610905{3}[source]
Not quite. It can reference any combination of columns with a unique index (of with the PK is by definition).
10. hobs ◴[] No.44611356[source]
An index is one thing (and important and good), but an FK allows you to completely eschew IO if done right. In other words "I guarantee that all values in this list exist in that list" is a great simple optimization path and some sql engines can use it to avoid joining data or checking for existence at all.
11. alexjplant ◴[] No.44611914[source]
During a work meeting I once suggested using a non-PK column in a Postgres database for a foreign key. A coworker confidently said that we shouldn't because joins would be slow. I pointed out that we could create an index on that column and they rebutted by claiming that PKs created some kind of "special" index. I didn't want to burn goodwill and so didn't push it further but it always struck me as silly.

Depending upon the database storage engine, available memory, and table size I could see there being _some_ performance hit if only PKs are used for statistics but I'd think that modern RDBMSes are smart enough to cache appropriately. Am I missing something?

replies(1): >>44612590 #
12. bevr1337 ◴[] No.44612391{3}[source]
I'll admit my experience in Django is only migrating customers off Django. Thanks for adding some interesting details about that ecosystem
13. quectophoton ◴[] No.44612590[source]
> and they rebutted by claiming that PKs created some kind of "special" index

Maybe they were thinking about something like the "clustered indexes" from SQL Server, and mistakenly thought PostgreSQL also worked like that:

> "When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table doesn't already exist and you don't specify a unique nonclustered index." [1]

> "Clustered indexes sort and store the data rows in the table or view based on their key values." [2]

So I'm guessing you could squeeze some extra performance for certain access patterns, maybe? I have not worked at any place where I had needed to worry about low level details like this, though, so obligatory disclaimer to take this comment with a grain of salt due to my lack of first-hand experience.

[1]: https://learn.microsoft.com/en-us/sql/relational-databases/i...

[2]: https://learn.microsoft.com/en-us/sql/relational-databases/i...