Getting AI to write good SQL

(cloud.google.com)

501 points richards | 3 comments | 16 May 25 21:10 UTC | HN request time: 0.523s | source

Show context

mritchie712 ◴[16 May 25 21:43 UTC] No.44010031[source]▶

the short answer: use a semantic layer.

It's the cleanest way to give the right context and the best place to pull a human in the loop.

A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.

With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.

We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.

My last company wrote a post on this in 2021[2]. Looks like the acquirer stopped paying for the blog hosting, but the HN post is still up.

0 - https://www.definite.app/

1 - https://cube.dev/

2 - https://news.ycombinator.com/item?id=25930190

replies(7): >>44010358 #>>44011108 #>>44011775 #>>44011802 #>>44012638 #>>44013043 #>>44013772 #

tclancy ◴[17 May 25 03:14 UTC] No.44011775[source]▶

>>44010031 #

Mother of God. I can write JSON instead of a language designed for querying. What is the advantage? If I’m going to move up an abstraction layer, why not give me natural language? Lots of things turn a limited natural language grammar into SQL for you. What is JSON going to {do: for: {me}}?

replies(2): >>44012372 #>>44013758 #

1. Spivak ◴[17 May 25 12:21 UTC] No.44013758[source]▶

>>44011775 #

I find it funny people are making fun of this while every ORM builds up an object representing the query and then compiles it to SQL. SQL but as a data structure you can manipulate has thousands of implementations because it solves a real problem. This time it's because LLMs have an easier time outputting complex JSON than SQL itself.

replies(2): >>44014898 #>>44021201 #

2. ses1984 ◴[17 May 25 15:11 UTC] No.44014898[source]▶

>>44013758 (TP) #

Any idea why that is?

3. ljm ◴[18 May 25 13:16 UTC] No.44021201[source]▶

>>44013758 (TP) #

Something still has to convert the JSON to SQL but in this case, who is writing the JSON? An LLM?

Sometimes, it's easier or more efficient to just learn the shit you're working with instead of spending 1000x the compute fobbing it off to OpenAI. Even just putting a blob of SQL in a heredoc and using a prepared statement to parameterise it is good enough.

Beyond that, query building is just part of the functionality an ORM provides. The main chunk of it is the mapping of DB-level data structures to app-level models and vice-versa, particularly in an OOP context.

↑