Getting AI to write good SQL

1. AdrianB1 ◴[16 May 25 23:20 UTC] No.44010668[source]▶

In real life I find using AI for SQL dangerous. It allows people that don't know what they do to write queries that can significantly impact servers. In my world databases are relatively big for most developers, but not huge.

Sometimes when I want to fine tune a query I am challenging AI to provide a better solution. I give it the already optimized query and I ask for better. I never got a better answer, sometimes because AI is hallucinating or because the changes that it proposes are not working in a way that is beneficial, it is like an idiot parrot is telling what it overheard in the brothel - good info if it is a war brothel frequented by enemy officers in 1916, but not these days.

replies(5): >>44010829 #>>44011104 #>>44011385 #>>44011554 #>>44011573 #

2. awesome_dude ◴[16 May 25 23:51 UTC] No.44010829[source]▶

>>44010668 (TP) #

Mate, IME programmers who don't know what they are doing just do it anyways then look to blame someone/something else if things turn to custard.

AI is just increasing the frequency of things turning to custard :)

replies(1): >>44011027 #

3. HideousKojima ◴[17 May 25 00:26 UTC] No.44011027[source]▶

>>44010829 #

AI is most effective as an accountability sink

4. cheema33 ◴[17 May 25 00:40 UTC] No.44011104[source]▶

>>44010668 (TP) #

> I give it the already optimized query and I ask for better. I never got a better answer..

This was my experience as well. However I have observed that things have been improving this regard. Newer LLMs do perform much better. And I suspect they will continue to get better over time.

replies(1): >>44011501 #

5. ziml77 ◴[17 May 25 01:37 UTC] No.44011385[source]▶

>>44010668 (TP) #

The strategy I've used with these people is to let them prototype with AI and then have them hand over their work to me where I can then make it significantly more efficient. The nice thing is that their poor performing version acts as a reference for validating the output of my queries.

6. cjbgkagh ◴[17 May 25 02:06 UTC] No.44011501[source]▶

>>44011104 #

I’ve been working on highly optimized code that heavily uses CPU intrinsics, a year ago no chance, 6 months ago a helpful reference, today it’s a good starting point. That is an insane pace of improvement.

7. scarface_74 ◴[17 May 25 02:19 UTC] No.44011554[source]▶

>>44010668 (TP) #

> It allows people that don't know what they do to write queries that can significantly impact servers.

At least for the only OLAP DB I use often - Amazon Redshift - that’s a solved problem with Workload Management Queues. You can restrict those users ability to consume too many resources.

For queries that are used for OLTP, I usually try to keep those queries relatively simple. If there is a reason for read queries that consume resources , those go to read replicas when strong consistently isn’t required

8. strict9 ◴[17 May 25 02:23 UTC] No.44011573[source]▶

>>44010668 (TP) #

It should never be at the point where some random person can impact a server.

That's what read replicas with read-only access are for. Production db servers should not be open to random queries and usage by people. That's only for the app to use.

replies(2): >>44013003 #>>44015070 #

9. AdrianB1 ◴[17 May 25 08:57 UTC] No.44013003[source]▶

>>44011573 #

How it should be and how it is, that depends on who is the decision maker. If the decision maker is a technical person, there is no gap, but in my case the decision maker is a non-technical manager with no competence to make such decisions, but that is the way the company is organized. So letting people use AI to dig through a 1 TB database is not a good idea, while not using AI prevents them to even try. Security by oblivion.

10. sgarland ◴[17 May 25 15:40 UTC] No.44015070[source]▶

>>44011573 #

Unless you have a much more regimented code review process than anywhere I've seen, "a random person" can impact prod quite easily by introducing a bad query into the app. Since ORMs are rampant, it's probably heavily obfuscated to begin with, so they won't even see the raw SQL. At best, they'll have run it on stage, where the DB size is probably so tiny that its performance issues go unnoticed.