Getting AI to write good SQL

(cloud.google.com)

501 points richards | 4 comments | 16 May 25 21:10 UTC | HN request time: 0.921s | source

Show context

nashashmi ◴[16 May 25 22:31 UTC] No.44010349[source]▶

>>44009848 (OP) #

AI text to regex solutions would be incredibly handy.

replies(4): >>44010706 #>>44011846 #>>44012981 #>>44015104 #

RadiozRadioz ◴[16 May 25 23:28 UTC] No.44010706[source]▶

>>44010349 #

This comment appears frequently and always surprises me. Do people just... not know regex? It seems so foreign to me.

It's not like it's some obscure thing, it's absolutely ubiquitous.

Relatively speaking it's not very complicated, it's widely documented, has vast learning resources, and has some of the best ROI of any DSL. It's funny to joke that it looks like line noise, but really, there is not a lot to learn to understand 90% of the expressions people actually write.

It takes far longer to tell an AI what you want than to write a regex yourself.

replies(12): >>44010769 #>>44010791 #>>44010803 #>>44010854 #>>44010941 #>>44011236 #>>44011532 #>>44011584 #>>44011591 #>>44012097 #>>44012483 #>>44013224 #

1. simonw ◴[17 May 25 06:46 UTC] No.44012483[source]▶

>>44010706 #

"It takes far longer to tell an AI what you want than to write a regex yourself."

My experience is the exact opposite. Writing anything but the simplest regex by hand still takes me significant time, and I've been using them for decades.

Getting an LLM to spit out a regex is so much less work. Especially since an LLM already knows the details of the different potential dialects of regex.

I use them to write regexes in PostgreSQL, Python, JavaScript, ripgrep... they've turned writing a regex from something I expect to involve a bunch of documentation diving to something I'll do on a whim.

Here's a recent example - my prompt included a copy of a PostgreSQL schema and these instructions:

  Write me a SQL query to extract
  all of my images and their alt
  tags using regular expressions.
  In HTML documents it should look
  for either <img .* src="..." .*
  alt="..." or <img alt="..." .*
  src="..." (images may be self-
  closing XHTML style in some 
  places). In Markdown they will
  always be ![alt text](url)

I ended up with 100 lines of SQL: https://gist.github.com/simonw/5b44a662354e124e33cc1d4704cdb...

The markdown portion of that is a good example of the kind of regex I don't enjoy writing by hand, due to the need to remember exactly which characters to escape and how:

  (REGEXP_MATCHES(commentary,
  '!\[([^\]]*)\]\(([^)]*)\)', 'g'))[2] AS src,
  (REGEXP_MATCHES(commentary,
  '!\[([^\]]*)\]\(([^)]*)\)', 'g'))[1] AS alt_text

Full prompt and notes here: https://simonwillison.net/2025/Apr/28/dashboard-alt-text/

replies(1): >>44013215 #

2. RadiozRadioz ◴[17 May 25 09:50 UTC] No.44013215[source]▶

>>44012483 (TP) #

Perhaps Perl has given me Stockholm Syndrome, but when I look at your escaped regex example, it's extremely natural for me. In fact, I'd say it's a little too simple, because the LLM forgot to exclude unnecessary whitespace:

  (REGEXP_MATCHES(commentary,
  '!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[2] AS src,
  (REGEXP_MATCHES(commentary,
  '!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[1] AS alt_text

That is just nitpicking a one-off example though, I understand your wider point.

I appreciate the LLM is useful for problems outside one's usual scope of comfort. I'm mainly saying that I think it's a skill where the "time economics" really are in favor of learning it and expanding your scope. As in, it does not take a lot learning time before you're faster than the LLM for 90% of things, and those things occur frequently enough that your "learning time deficit" gets repaid quickly. Certainly not the case for all skills, but I truly believe regex is one of them due to its small scope and ubiquitous application. The LLM can be used for the remaining 10% of really complicated cases.

As you've been using regex for decades, there is already a large subset of problems where you're faster than the LLM. So that problem space exists, it's all about how to tune learning time to right-size it for the frequency the problems are encountered. Regex, I think, is simple enough & frequent enough where that works very well.

replies(1): >>44013646 #

3. simonw ◴[17 May 25 11:54 UTC] No.44013646[source]▶

>>44013215 #

> As in, it does not take a lot learning time before you're faster than the LLM for 90% of things, and those things occur frequently enough that your "learning time deficit" gets repaid quickly.

It doesn't matter how fast I get at regex, I still won't be able to type any but the shortest (<5 characters) patterns out quicker than an LLM can. They are typing assistants that can make really good guesses about my vaguely worded intent.

As for learning deficit: I am learning so much more thanks to heavy use of LLMs!

Prior to LLMs the idea of using a 100 line PostgreSQL query with embedded regex to answer a mild curiosity about my use of alt text would have finished at the idea stage: that's not a high value enough problem for me to invest more than a couple of minutes, so I would not have done it at all.

replies(1): >>44013937 #

4. Agraillo ◴[17 May 25 12:52 UTC] No.44013937{3}[source]▶

>>44013646 #

Good points. Also looking at your original example I noticed that not only humans can explain regularities they expect in many different ways (also correcting along the way), they can basically ask LLM to base the result on a reference. In your example you provided a template with an img tag and brackets having different attributes patterns. But one can also just ask for a html-style tag. As I did with the "Please create a regex for extracting image file names when in a text a html-style tag img is met" (not posting it here, but "src" is clearly visible in the answer). So the "knowledge" from other domains is applied to the regex creation.

↑