←back to thread

Getting AI to write good SQL

(cloud.google.com)
474 points richards | 10 comments | | HN request time: 1.384s | source | bottom
Show context
wewewedxfgdf ◴[] No.44010757[source]
Can I just say that Google AI Studio with latest Gemini is stunningly, amazingly, game changingly impressive.

It leaves Claude and ChatGPT's coding looking like they are from a different century. It's hard to believe these changes are coming in factors of weeks and months. Last month i could not believe how good Claude is. Today I'm not sure how I could continue programming without Google Gemini in my toolkit.

Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it.

replies(26): >>44010808 #>>44010923 #>>44011434 #>>44011854 #>>44011858 #>>44011954 #>>44012172 #>>44012250 #>>44012251 #>>44012503 #>>44012606 #>>44012629 #>>44013306 #>>44013367 #>>44013381 #>>44013473 #>>44013576 #>>44013719 #>>44013871 #>>44013899 #>>44014263 #>>44014585 #>>44014770 #>>44014917 #>>44014928 #>>44018375 #
1. oplorpe ◴[] No.44014770[source]
I’ve yet to see any llm proselytizers acknowledge this glaring fact:

Each new release is “game changing”.

The implication being the last release y’all said was “game changing” is now “from a different century”.

Do you see it?

For this to be an accurate and true assessment means you were wrong both before and wrong now.

replies(3): >>44015458 #>>44015553 #>>44017817 #
2. ayrtondesozzla ◴[] No.44015458[source]
I'm not an LLM proselytiser but this makes no sense? It would almost make sense if someone were claiming there are only two possible games, the old one and the new one, and never any more. Who claims that?
replies(2): >>44015724 #>>44017022 #
3. squidbeak ◴[] No.44015553[source]
I'm unsure I fully understand your contention.

Are you suggesting that a rush to hyperbole which you don't like means advances in a technology aren't groundbreaking?

Or is it that if there is more than one impressive advance in a technology, any advance before the latest wasn't worthy of admiration at the time?

replies(1): >>44015803 #
4. oplorpe ◴[] No.44015724[source]
I suppose my point is along these lines.

When gpt3 was trained, its parent company refused to release the weights claiming it was a “clear and present danger to civilization”. Now GPT3 is considered a discardable toy.

So either these things are going toward an inflection point of usefulness or this release too will be, in time, mocked as a discardable toy too.

So why every 3 days do we get massive threads with people fawning over the new fashion like this singular technology that is developing is ackshually finally become the fire stolen from the gods.

replies(1): >>44017842 #
5. oplorpe ◴[] No.44015803[source]
Yes, it’s the hyperbole.

This is, at best, an incremental development of an existing technology.

Though even that is debatable considering the wildly differing opinions in this thread in regards to this vs other models.

replies(1): >>44020009 #
6. trympet ◴[] No.44017022[source]
Parent is implying that we're still playing the same game.
replies(1): >>44017772 #
7. ludwik ◴[] No.44017772{3}[source]
Is "game-changing" supposed to imply changing the game to a completely different one? Like, is the metaphor that we were playing soccer, and then we switched to paintball or basketball or something? I always understood it to mean a big change within the same game - like we’re still playing soccer, but because of a goal or a shift, the team that was on defense now has to go on offense...
8. raincole ◴[] No.44017817[source]
It is just how fast this field advances compared to all the other things we've seen before. Human language doesn't have better words to describe this unusual phenomenon, so we resort to "game-changing".
9. ayrtondesozzla ◴[] No.44017842{3}[source]
Well essentially then, I agree, I find it perplexing too.

I got particularly burned by that press release a little before Christmas, where it was claimed that 4o was doing difficult maths and programming stuff. A friend told me about it very excitedly, I imagined they were talking about something that had really happened.

A few days later when I'd time to look into it, it turned out that essentially we'd internal testing and press releases to go off, I couldn't believe it. I said - so, marketing. A few months later it was revealed that a lot of the claimed results in those world-changing benchmarks were due to answers that had been leaked, etc etc. The usual hype theatre, in the end

10. 59nadir ◴[] No.44020009{3}[source]
About 1.5-2 years ago I was using GitHub Copilot to write code, mostly as a boilerplate completer, really, because eventually I realized I spent too much time reading the suggestions and/or fixing the end result when I should've just written it completely myself. I did try it out with a pretty wide scope, i.e. letting it do more or less and seeing what happened. All in all it was pretty cool, I definitely felt like there were some magic moments where it seems to put everything together and sort of read my mind.

Anyway, that period ended and I went until a few months ago without touching anything like this and I was hearing all these amazing things about using Cursor with Claude Sonnet 3.5, so I decided to try it out with a few use cases:

1. Have it write a tokenizer and parser from scratch for a made up Clojure-like language

2. Have it write the parser for the language given the tokenizer I had already written previously

3. Have it write only single parsing functions for very specific things with both the tokenizer and parsing code to look at to see how it works

#1 was a complete and utter failure, it couldn't even put together a simple tokenizer even if shown all of the relevant parts of the host language that would enable a reasonable tokenizer end result.

#2 was only slightly better, but the end results were nowhere near usable, and even after iteration it couldn't produce a runnable result.

#3 is the first one my previous experience with Copilot suggested to me should be doable. We started out pretty badly, it misunderstood one of the tokenizer functions it had examples for and used it in a way that doesn't really make sense given the example. After that it also wanted to add functions it had already added for some reason. I ran into myriad issues with just getting it to either correct, move on or do something productive until I just called it quits.

My personal conclusion from all of this is that yes, it's all incredibly incremental, any kind of "coding companion" or agent has basically the same failure modes/vectors they had years ago and much of that hasn't improved all that much.

The odds that I could do my regular work on 3D engines with the coding companions out there are slim to none when it can't even put together something as simple as a tokenizer together, or use an already existing one to write some simple tokenizer functions. For reference I know that it took my colleague who has never written either of those things 30 minutes until he productively and correctly used exactly the same libraries the LLM was given.