←back to thread

121 points b-man | 1 comments | | HN request time: 0.203s | source
Show context
jandrewrogers ◴[] No.44026812[source]
This takes an overly simple view of what domains can look like. There are data models that necessarily violate these principles, and they aren’t all that rare.

Some examples:

> A relation should be identified by a natural key that reflects the entity’s essential, domain-defined identity

In some domains there is no natural key because the identity is literally an inference problem and relations are probabilistic. The objective of the data model is to aggregate enough records to discover and attribute natural keys with some level of confidence. A common class of data models with this property are entity resolution data models.

> All information in the database is represented explicitly and in exactly one way

Some data models have famously dual natures. Cartographic data models, for example, must be represented both as a graph models (for routing and reachability relationships) and as geometric models (for spatial relationships). The “one true representation” has been a perennial argument in mapping for my entire life and both sides are demonstrably correct.

> Every base relation should be in its highest normal form (3, 5 or 6th normal form).

This is one of those things that sounds attractive because it ignores that it requires no ambiguities about domain boundaries or semantics, which doesn’t exist in practice. I bought into this idea too when I was a young and naive data modeler. Trying to tamp out these ambiguities adds an unbounded number of data model epicycles that add a lot of complexity and performance loss. At some point, strict normalization is not worth the cost in several aspects.

In almost all cases, it is far more important that the data model be efficient to work with than it be the abstract platonic ideal of a domain model. All of these principles have to work on real hardware in real operational environments with all of the messy limitations that implies.

replies(2): >>44028007 #>>44036521 #
1. b-man ◴[] No.44036521[source]
> Trying to tamp out these ambiguities adds an unbounded number of data model epicycles that add a lot of complexity and performance loss

If you can talk about a business rule, you have a predicate. If you have a predicate, you can make it 5 or 6 normal form, since all that means is that your relation expresses only and completely the predicate.

It seems that your definition of normalization is not the one that I am using above. What is it?