Most active commenters

    ←back to thread

    Getting AI to write good SQL

    (cloud.google.com)
    478 points richards | 13 comments | | HN request time: 2.264s | source | bottom
    1. mousetree ◴[] No.44010169[source]
    Out of all the AI tools and models I’ve tried, the most disappointing is the Gemini built into BigQuery. Despite having well named columns with good descriptions it consistently gets nowhere close to solving the problem.
    replies(2): >>44010326 #>>44010346 #
    2. quantadev ◴[] No.44010326[source]
    Having proper constraints and foreign keys that are clear is generally all that's needed in my experience. Are you sure your tables have well defined constraints, so that the AI can be absolutely 100% sure how everything links up? SQL is very precise, but only if you're utilizing constraints and foreign key definitions well.
    replies(1): >>44010802 #
    3. flysand7 ◴[] No.44010346[source]
    Having written more SQL than any other programming language by now, every time I've tried to use AI to write the query for me, I'd spend way more time getting the output right than if I'd just written it myself.

    As a quick aside there's one thing I wish SQL had that would make writing queries so much faster. At work we're using a DSL that has one operator that automatically generates joins from foreign key columns, just like

        credit.CLIENT->NAME
    
    And you got clients table automatically joined into the query. Having to write ten to twenty joins for every query is by far the worst thing, everything else about writing SQL is not that bad.
    replies(6): >>44010730 #>>44010752 #>>44011084 #>>44011577 #>>44015959 #>>44020501 #
    4. emptysea ◴[] No.44010730[source]
    That's one of the features of EdgeQL:

        select Movie {
          id,
          title,
          actors: {
            name
          }
        };
    
    
    https://docs.geldata.com/learn/edgeql#select-objects

    Although I think good enough language server / IDE could automatically insert the join when you typed `credit.CLIENT->NAME`

    5. efromvt ◴[] No.44010752[source]
    (Shameless plug) writing the same joins over and over (and refactoring when you update stuff) was one of my biggest boilerplate annoyances with SQL - I’ve tried to fix that while still keeping the rest of SQL in https://trilogydata.dev/
    6. carderne ◴[] No.44010802[source]
    It’s BigQuery, so it likely won’t have any of these.
    replies(1): >>44011188 #
    7. galenmarchetti ◴[] No.44011084[source]
    yeah we’re doing something similar under the hood at AstroBee. it’s way way way easier to handle joins this way.

    imo any hope of really leveraging llms in this context needs this + human review on additions to a shared ontology/semantic layer so most of the nuanced stuff is expressed simply and reviewed by engineering before business goes wild with it

    8. quantadev ◴[] No.44011188{3}[source]
    BigQuery supports all those SQL things I mentioned.
    replies(1): >>44011214 #
    9. carderne ◴[] No.44011214{4}[source]
    I’m just saying it’s likely they aren’t using them. But clearly you should if you want LLMs to do anything useful.
    10. emmelaich ◴[] No.44011577[source]
    I'd like there to be a function or macro for a bunch of joins, say

        DEFINE products_by_order AS orders o JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id
    
    You could make it visible to the DB rather than just a macro so it could optimise it by caching etc. Sort of like a view but on demand.
    replies(1): >>44011833 #
    11. icedchai ◴[] No.44011833{3}[source]
    Sounds like a CTE?
    12. aradox66 ◴[] No.44015959[source]
    Can't recommend dbt highly enough to get over the SQL boilerplate problem. After using dbt for a few years it's unimaginable to work without it.
    13. johnthescott ◴[] No.44020501[source]
    In PostgreSQL i tend to use NATURAL JOINs in the FROM clause to whittle down the distracting joins in the more relavent WHERE clause.