Getting AI to write good SQL

(cloud.google.com)

501 points richards | 1 comments | 16 May 25 21:10 UTC | HN request time: 0.203s | source

Show context

mritchie712 ◴[16 May 25 21:43 UTC] No.44010031[source]▶

the short answer: use a semantic layer.

It's the cleanest way to give the right context and the best place to pull a human in the loop.

A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.

With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.

We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.

My last company wrote a post on this in 2021[2]. Looks like the acquirer stopped paying for the blog hosting, but the HN post is still up.

0 - https://www.definite.app/

1 - https://cube.dev/

2 - https://news.ycombinator.com/item?id=25930190

replies(7): >>44010358 #>>44011108 #>>44011775 #>>44011802 #>>44012638 #>>44013043 #>>44013772 #

tclancy ◴[17 May 25 03:14 UTC] No.44011775[source]▶

>>44010031 #

Mother of God. I can write JSON instead of a language designed for querying. What is the advantage? If I’m going to move up an abstraction layer, why not give me natural language? Lots of things turn a limited natural language grammar into SQL for you. What is JSON going to {do: for: {me}}?

replies(2): >>44012372 #>>44013758 #

Spivak ◴[17 May 25 12:21 UTC] No.44013758[source]▶

>>44011775 #

I find it funny people are making fun of this while every ORM builds up an object representing the query and then compiles it to SQL. SQL but as a data structure you can manipulate has thousands of implementations because it solves a real problem. This time it's because LLMs have an easier time outputting complex JSON than SQL itself.

replies(2): >>44014898 #>>44021201 #

1. ses1984 ◴[17 May 25 15:11 UTC] No.44014898[source]▶

>>44013758 #

Any idea why that is?

↑