Getting AI to write good SQL

1. mousetree ◴[16 May 25 22:02 UTC] No.44010169[source]▶

Out of all the AI tools and models I’ve tried, the most disappointing is the Gemini built into BigQuery. Despite having well named columns with good descriptions it consistently gets nowhere close to solving the problem.

replies(2): >>44010326 #>>44010346 #

2. quantadev ◴[16 May 25 22:27 UTC] No.44010326[source]▶

>>44010169 (TP) #

Having proper constraints and foreign keys that are clear is generally all that's needed in my experience. Are you sure your tables have well defined constraints, so that the AI can be absolutely 100% sure how everything links up? SQL is very precise, but only if you're utilizing constraints and foreign key definitions well.

replies(1): >>44010802 #

3. flysand7 ◴[16 May 25 22:31 UTC] No.44010346[source]▶

>>44010169 (TP) #

Having written more SQL than any other programming language by now, every time I've tried to use AI to write the query for me, I'd spend way more time getting the output right than if I'd just written it myself.

As a quick aside there's one thing I wish SQL had that would make writing queries so much faster. At work we're using a DSL that has one operator that automatically generates joins from foreign key columns, just like

    credit.CLIENT->NAME

And you got clients table automatically joined into the query. Having to write ten to twenty joins for every query is by far the worst thing, everything else about writing SQL is not that bad.

replies(6): >>44010730 #>>44010752 #>>44011084 #>>44011577 #>>44015959 #>>44020501 #

4. emptysea ◴[16 May 25 23:33 UTC] No.44010730[source]▶

>>44010346 #

That's one of the features of EdgeQL:

    select Movie {
      id,
      title,
      actors: {
        name
      }
    };

https://docs.geldata.com/learn/edgeql#select-objects

Although I think good enough language server / IDE could automatically insert the join when you typed `credit.CLIENT->NAME`

5. efromvt ◴[16 May 25 23:37 UTC] No.44010752[source]▶

>>44010346 #

(Shameless plug) writing the same joins over and over (and refactoring when you update stuff) was one of my biggest boilerplate annoyances with SQL - I’ve tried to fix that while still keeping the rest of SQL in https://trilogydata.dev/

6. carderne ◴[16 May 25 23:45 UTC] No.44010802[source]▶

>>44010326 #

It’s BigQuery, so it likely won’t have any of these.

replies(1): >>44011188 #

7. galenmarchetti ◴[17 May 25 00:36 UTC] No.44011084[source]▶

>>44010346 #

yeah we’re doing something similar under the hood at AstroBee. it’s way way way easier to handle joins this way.

imo any hope of really leveraging llms in this context needs this + human review on additions to a shared ontology/semantic layer so most of the nuanced stuff is expressed simply and reviewed by engineering before business goes wild with it

8. quantadev ◴[17 May 25 00:55 UTC] No.44011188{3}[source]▶

>>44010802 #

BigQuery supports all those SQL things I mentioned.

replies(1): >>44011214 #

9. carderne ◴[17 May 25 01:02 UTC] No.44011214{4}[source]▶

>>44011188 #

I’m just saying it’s likely they aren’t using them. But clearly you should if you want LLMs to do anything useful.

10. emmelaich ◴[17 May 25 02:24 UTC] No.44011577[source]▶

>>44010346 #

I'd like there to be a function or macro for a bunch of joins, say

    DEFINE products_by_order AS orders o JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id

You could make it visible to the DB rather than just a macro so it could optimise it by caching etc. Sort of like a view but on demand.

replies(1): >>44011833 #

11. icedchai ◴[17 May 25 03:32 UTC] No.44011833{3}[source]▶

>>44011577 #

Sounds like a CTE?

12. aradox66 ◴[17 May 25 18:06 UTC] No.44015959[source]▶

>>44010346 #

Can't recommend dbt highly enough to get over the SQL boilerplate problem. After using dbt for a few years it's unimaginable to work without it.

13. johnthescott ◴[18 May 25 11:22 UTC] No.44020501[source]▶

>>44010346 #

In PostgreSQL i tend to use NATURAL JOINs in the FROM clause to whittle down the distracting joins in the more relavent WHERE clause.