SQL, Homomorphisms and Constraint Satisfaction Problems

(www.philipzucker.com)

153 points xlinux | 3 comments | 20 Nov 24 17:07 UTC | HN request time: 0.637s | source

Show context

mbid ◴[20 Nov 24 19:11 UTC] No.42197125[source]▶

>>42195994 (OP) #

The post mentions the idea that querying a database D can be understood algebraically as enumerating all morphisms Q -> D, where Q is the "classifying" database of the query, i.e. a minimal database instance that admits a single "generic" result of the query. You can use this to give a neat formulation of Datalog evaluation. A Datalog rule then corresponds a morphism P -> H, where P is the classifying database instance of the rule body and H is the classifying database instance for matches of both body and head. For example, for the the transitivity rule

  edge(x, z) :- edge(x, y), edge(y, z).

you'd take for P the database instance containing two rows (a_1, a_2) and (a_2, a_3), and the database instance H contains additionally (a_1, a_3). Now saying that a Database D satisfies this rule means that every morphism P -> D (i.e., every match of the premise of the rule) can be completed to a commuting diagram

  P --> D
  |    ^
  |   /
  ⌄  /
  Q

where the additional map is the arrow Q -> D, which corresponds to a match of both body and head.

This kind of phenomenon is known in category theory as a "lifting property", and there's rich theory around it. For example, you can show in great generality that there's always a "free" way to add data to a database D so that it satisfies the lifting property (the orthogonal reflection construction/the small object argument). Those are the theoretical underpinnings of the Datalog engine I'm sometimes working on [1], and there they allow you to prove that Datalog evaluation is also well-defined if you allow adjoining new elements during evaluation in a controlled way. I believe the author of this post is involved in the egglog project [2], which might have similar features as well.

[1] https://github.com/eqlog/eqlog [2] https://github.com/egraphs-good/egglog

replies(2): >>42201342 #>>42201448 #

1. bubblyworld ◴[21 Nov 24 05:36 UTC] No.42201448[source]▶

>>42197125 #

Very interesting perspective I hadn't heard before on datalog, thanks. How far does it go - can you interpret extensions of datalog (say negation or constrained existentials) in a nice categorical way, for instance? I've given this very little thought but I imagine you'd have issues with uniqueness of these "minimal" database instances, and I'm not sure what that means for these lifting properties.

(if my question even makes sense, pardon the ignorance)

replies(1): >>42203479 #

2. mbid ◴[21 Nov 24 12:03 UTC] No.42203479[source]▶

>>42201448 (TP) #

If you're interested in the details, you might want to have a look at papers [1] or [2].

You can add existentials in this framework, which basically means that the lifting problems mentioned above don't need to have unique solutions. But as you say, then the "minimal" databases aren't determined uniquely up to isomorphism anymore. So the result of Datalog evaluation now depends on the order in which you apply rules.

If I recall correctly, then [3] discusses a logic corresponding to accessible categories (Datalog + equality corresponds to locally presentable categories) which includes the the theory of fields. The theory of fields involves the negation 0 != 1, so perhaps that might give you a nicer way to incorporate negations without stratification.

[1] https://www.mbid.me/eqlog-semantics/

[2] https://arxiv.org/abs/2205.02425

[3] Locally presentable and accessible categories, https://www.cambridge.org/core/books/locally-presentable-and...

replies(1): >>42206185 #

3. bubblyworld ◴[21 Nov 24 16:51 UTC] No.42206185[source]▶

>>42203479 #

Thanks for the references, those papers looks great! Will dig into them this evening =)

↑