←back to thread

Getting AI to write good SQL

(cloud.google.com)
480 points richards | 1 comments | | HN request time: 0.202s | source
Show context
mritchie712 ◴[] No.44010031[source]
the short answer: use a semantic layer.

It's the cleanest way to give the right context and the best place to pull a human in the loop.

A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.

With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.

We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.

My last company wrote a post on this in 2021[2]. Looks like the acquirer stopped paying for the blog hosting, but the HN post is still up.

0 - https://www.definite.app/

1 - https://cube.dev/

2 - https://news.ycombinator.com/item?id=25930190

replies(7): >>44010358 #>>44011108 #>>44011775 #>>44011802 #>>44012638 #>>44013043 #>>44013772 #
ljm ◴[] No.44010358[source]
> you get the added benefit of writing queries in JSON instead of raw SQL.

I’m sorry, I can’t. The tail is wagging the dog.

dang, can you delete my account and scrub my history? I’m serious.

replies(6): >>44010404 #>>44010459 #>>44011117 #>>44011630 #>>44011681 #>>44013578 #
fhkatari ◴[] No.44010459[source]
You move all the tools to debug and inspect slow queries, in a completely unsupported JSON environment, with prompts not to make up column names. And this is progress?
replies(2): >>44010596 #>>44011364 #
mritchie712 ◴[] No.44011364[source]
The JSON compiles to SQL. Have you used a semantic layer? You might have a different opinion if you tried one.
replies(2): >>44012209 #>>44021252 #
1. ljm ◴[] No.44021252[source]

    SELECT email FROM users WHERE deleted_at IS NOT NULL OR status = 'active'
seems more semantic to me at first glance than piping this into a JSON->SQL library

    {
      "_select": "email",
      "_table": "users",
      "_where": { 
        "deleted_at": { "_is": { "_not": SQL_NULL_VALUE } },
        "_or": [
          { "status": "inactive" },
        ]
      }
    }
which is usually how these things end up looking.