Most active commenters
  • richardw(3)
  • layer8(3)

←back to thread

Getting AI to write good SQL

(cloud.google.com)
478 points richards | 12 comments | | HN request time: 2.318s | source | bottom
1. tango12 ◴[] No.44010584[source]
What’s the eventual goal of text to sql?

Is it to build a copilot for a data analyst or to get business insight without going through an analyst?

If it’s the latter - then imho no amount of text to sql sophistication will solve the problem because it’s impossible for a non analyst to understand if the sql is correct or sufficient.

These don’t seem like text2sql problems:

> Why did we hit only 80% of our daily ecommmerce transaction yesterday?

> Why is customer acquisition cost trending up?

> Why was the campaign in NYC worse than the same in SF?

replies(5): >>44010646 #>>44010660 #>>44010746 #>>44010772 #>>44011353 #
2. cdavid ◴[] No.44010646[source]
My observation is the latter, but I agree the results fall short of expectations. Business will often want last minute change in reporting, don't get what they want at the right time because lack of analysts, and hope having "infinite speed" will solve the problem.

But ofc the real issue is that if your report metrics change last minute, you're unlikely to get good report. That's a symptom of not thinking much about your metrics.

Also, reports / analysis generally take time because the underlying data are messy, lots of business knowledge encoded "out of band", and poor data infrastructure. The smarter analytics leaders will use the AI push to invest in the foundations.

3. mynegation ◴[] No.44010660[source]
To be fair, these don’t look like SQL problems either. SQL answers “what”, not “why” questions. The goal of text2sql is to free up analyst time to get through “what” much faster and - possibly- focus on “why” questions.
4. phillipcarter ◴[] No.44010746[source]
> These don’t seem like text2sql problems:

Correct, but I would propose two things to add to your analysis:

1. Natural language text is a universal input to LLM systems

2. text2sql makes the foundation of retrieving the information that can help answer these higher-level questions

And so in my mind, the goals for text2sql might be a copilot (near-term), but the long-term is to have a good foundation for automating text2sql calls, comparing results, and pulling them into a larger workflow precisely to help answer the kinds of questions you're proposing.

There's clearly much work needed to achieve that goal.

replies(1): >>44011125 #
5. richardw ◴[] No.44010772[source]
Any algo that a human would follow can be built and tested. If you have 10 analysts you have 10 different skill levels, with differing understanding of the database and business context. So automation gives you a platform to achieve a floor of skill and knowledge. The humans can now be “at least this good or better”. A new analyst instantly gets better, faster.

I assume a useful goal would be to guide development of the system in coordination with experts, test it, have the AI explain all trade offs, potential bugs, sense check it against expected results etc.

Taste is hard to automate. Real insight is hard to automate. But a domain expert who isn’t an “analyst” can go extremely far with well designed automation and a sense of what rational results should look like. Obviously the state of the art isn’t perfect but you asked about goals, so those would be my goals.

replies(1): >>44010901 #
6. layer8 ◴[] No.44010901[source]
But “text to sql” isn’t an algorithm.
replies(1): >>44011080 #
7. richardw ◴[] No.44011080{3}[source]
The processes the people want the sql for are likely filled with algo’s. An exec wants info in a known domain, set up a text to sql system with lots of context and testing to generate queries. If they think they have something good, get an expert to test and productionise it.

“Thank you for your request. Can you walk me through the steps you’d use to do this manually? What things would you watch out for? What kind of number ranges are reasonable? I can propose an algorithm and you tell me if that’s correct. The admins have set up guidelines on how to reason about customer and purchase data. Is the following consistent with your expectations?”

replies(1): >>44011142 #
8. galenmarchetti ◴[] No.44011125[source]
yeah I agree with this - good text2sql is essential but just one part of a larger stack that will actually get there. Seems possible tho
9. layer8 ◴[] No.44011142{4}[source]
This is the same fallacy as low-code/no-code. If you have to check a precise algorithm, you’re effectively coding, and you need a language with the same precision as a programming language.
replies(1): >>44012160 #
10. ◴[] No.44011353[source]
11. richardw ◴[] No.44012160{5}[source]
Only if you want a production-ready output. To get execs able to self-feed enough, this works fine. Look, you don’t see value until it’s perfect. Good, other people do. I see your fallacy and raise you a false dichotomy.
replies(1): >>44014347 #
12. layer8 ◴[] No.44014347{6}[source]
The problem I see is how do you verify that the result of your text-to-sql is really what you were asking for, without understanding the SQL (or “the algorithm”)? It boils down to that you have to know what you are doing, and with the present state of art of AI we can’t have confidence in that.