Most active commenters
  • CuriouslyC(4)
  • conartist6(4)
  • (4)
  • oplorpe(3)
  • Workaccount2(3)
  • ayrtondesozzla(3)

←back to thread

Getting AI to write good SQL

(cloud.google.com)
476 points richards | 100 comments | | HN request time: 1.689s | source | bottom
1. wewewedxfgdf ◴[] No.44010757[source]
Can I just say that Google AI Studio with latest Gemini is stunningly, amazingly, game changingly impressive.

It leaves Claude and ChatGPT's coding looking like they are from a different century. It's hard to believe these changes are coming in factors of weeks and months. Last month i could not believe how good Claude is. Today I'm not sure how I could continue programming without Google Gemini in my toolkit.

Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it.

replies(26): >>44010808 #>>44010923 #>>44011434 #>>44011854 #>>44011858 #>>44011954 #>>44012172 #>>44012250 #>>44012251 #>>44012503 #>>44012606 #>>44012629 #>>44013306 #>>44013367 #>>44013381 #>>44013473 #>>44013576 #>>44013719 #>>44013871 #>>44013899 #>>44014263 #>>44014585 #>>44014770 #>>44014917 #>>44014928 #>>44018375 #
2. insin ◴[] No.44010808[source]
Is it just me or did they turn off reasoning mode in free Gemini Pro this week?

It's pretty useful as long as you hold it back from writing code too early, or too generally, or sometimes at all. It's a chronic over-writer of code, too. Ignoring most of what it attempts to write and using it to explore the design space without ever getting bogged down in code and other implementation details is great though.

I've been doing something that's new to me but is going to be all over the training data (subscription service using stripe) and have often been able to pivot the planned design of different aspects before writing a single line of code because I can get all the data it already has regurgitated in the context of my particular tech stack and use case.

replies(2): >>44010907 #>>44012369 #
3. CuriouslyC ◴[] No.44010907[source]
I think reasoning in the studio is gated by load, and at the same time I wasn't seeing so much reasoning in AIstudio, I was getting vertex service overloaded calls pretty frequently on my agents.
4. CuriouslyC ◴[] No.44010923[source]
I'm really surprised more people haven't caught on. Claude can one shot small stuff of similar complexity, but as soon as you start to really push the model into longer, more involved use cases Gemini pulls way ahead. The context handling is so impressive, in addition to using it for coding agents, I use Gemini as a beta reader for a fairly long manuscript (~85k words) and it absolutely nails it, providing a high level report that's comparable to what a solid human beta reader would provide in seconds.
replies(3): >>44010944 #>>44011563 #>>44013373 #
5. wewewedxfgdf ◴[] No.44010944[source]
It is absolutely the greatest golden age in programming ever - all these infinitely wealthy companies spending bajillions competing on who can make the best programming companion.

Apart from the apologising. It's silly when the AI apologises with ever more sincere apologies. There should be no apologies from AIs.

replies(5): >>44011045 #>>44011815 #>>44013284 #>>44013588 #>>44013881 #
6. thingsilearned ◴[] No.44011045{3}[source]
companion or replacement?
replies(3): >>44011187 #>>44012053 #>>44012242 #
7. Terr_ ◴[] No.44011187{4}[source]
... Or saboteur. :p
8. noosphr ◴[] No.44011434[source]
It always is for the first week. Then you find out that the last 10% matter a lot more than than the other 90%. And finally they turn off the high compute version and you're left with a brain dead model that loses to a 32b local model half the time.
replies(1): >>44015307 #
9. koakuma-chan ◴[] No.44011563[source]
And Gemini is free.
replies(3): >>44011998 #>>44012498 #>>44012727 #
10. yujzgzc ◴[] No.44011815{3}[source]
You're absolutely right! My mistake. I'll be careful about apologizing too much in the future.
replies(1): >>44012834 #
11. petesergeant ◴[] No.44011854[source]
Is this distinct from using Gemini 2.5 Pro? If not, this doesn’t match my experience — I’ve been getting a lot of poorly designed TypeScript with an excess of very low quality comments.
replies(1): >>44013729 #
12. alostpuppy ◴[] No.44011858[source]
How do you use it exactly? Does it integrate with any IDEs?
replies(6): >>44012050 #>>44012300 #>>44012332 #>>44012390 #>>44012847 #>>44013858 #
13. landl0rd ◴[] No.44011954[source]
Really? I get goofy random substitutions like sometimes from foreign languages. It also doesn't do good with my mini-tests of "can you write modern Svelte without inserting React" and "can you fix a borrow-checking issue in Rust with lifetimes, not Arc/Cell slop"

That doesn't mean it's worse than the others just not much better. I haven't found anything that worked better than o1-preview so far. How are you using it?

14. scuol ◴[] No.44011998{3}[source]
Well, as with many of Google's services, you pay with your data.

Pay-as-you-go with Gemini does not snort your data for their own purposes (allegedly...).

replies(2): >>44012270 #>>44014975 #
15. Mossy9 ◴[] No.44012050[source]
Jetbrains AI recently added (beta) access to Gemini Pro 2.5 and there's of course plugins like Continue.dev that provide access to pretty much anything with an API
16. tonyhart7 ◴[] No.44012053{4}[source]
they would replace entire software department until AI make bug because endless changes into your javascript framework then they would hire human again to make fix

we literally creating solution for our own problem

replies(1): >>44015575 #
17. surgical_fire ◴[] No.44012242{4}[source]
They are a replacement if your job is only to write code.

Especially if your code contains a few bugs, misconceptions, and is sometimes completely unable to fix mistakes, going back and forth into the same wrong solutions.

This is not to say that AI assistants are useless. They are a good productivity tool, and I can output code much faster, especially for domains I am very familiar with.

That said, these starry-eyed AI circlejerk threads are incredibly cringe.

18. ifellover ◴[] No.44012250[source]
Absolutely agree. I really pushed it last week with a screenshot of a very abstract visualisation that we’d done in a Miro board of which we couldn’t find a library that did exactly what we wanted, so we turned to Gemini.

Essentially we were hoping to tie that to data inputs and have a system to regularly output the visualisation but with dynamic values. I bet my colleague it would one shot it: it did.

What I’ve also found is that even a sloppy prompt still somehow is reading my mind on what to do, even though I’ve expressed myself poorly.

Inversely, I’ve really found myself rejecting suggestions from ChatGPT, even o4-mini-high. It’s just doing so much random crap I didn’t ask and the code is… let’s say not as “Gemini” as I’d prefer.

19. in_ab ◴[] No.44012251[source]
I asked it to make some changes to the code it wrote. But it kept pumping out the same code with more and more comments to justify itself. After the third attempt I realized I could have done it myself in less time.
20. maksimur ◴[] No.44012270{4}[source]
Undoubtedly, but a significant positive aspect is the democratization of this technology that enables access for people who could not afford it, not productively, that is.
21. miyuru ◴[] No.44012300[source]
There is Gemini Code Assist.

https://developers.google.com/gemini-code-assist/docs/overvi...

22. pimeys ◴[] No.44012332[source]
Zed supports it out of the box.
23. energy123 ◴[] No.44012369[source]
They rolled out a new model a week ago which has a "bug" where in long chats it forgets to emit the tokens required for the UI to detect that it's reasoning. You can remind it that it needs to emit these tokens, which helps, or accept that it will sometimes fail to do it. I don't notice a deterioration in performance because it is still reasoning (you can tell by the nature of the output), it's just that those tokens aren't in <think> tags or whatever's required by the UI to display it as such.
24. beauzero ◴[] No.44012390[source]
Give Cline + vscode a try. Make sure to implement the "memory bank"...see Cline docs at cline.bot
replies(1): >>44012738 #
25. harvey9 ◴[] No.44012498{3}[source]
The first hit is always free
replies(2): >>44013270 #>>44013594 #
26. Der_Einzige ◴[] No.44012503[source]
Shhh!!! Normies will catch on and google will stop making it free.

But more seriously, they need to uncap temperature and allow more samplers if they want to really flex on their competition.

replies(1): >>44014836 #
27. alecco ◴[] No.44012606[source]
Remember when Microsoft started to do good things? Big corps suck when they are on top and unchallenged. It's imperative to reduce their monopolies.
replies(1): >>44013288 #
28. lifty ◴[] No.44012629[source]
Excuse my ignorance, but is the good experience somehow influenced by Google AI Studio as well or only by the capability of the model itself? I know Gemini 2.5 is good, have been using it myself for a while. I still switch between Sonnet and Gemini, because I feel Claude code does some things better.
29. nativeit ◴[] No.44012727{3}[source]
We’re all paying for this. In this case, the costs are only abstract, rather than the competing subscription options that are indeed quite tangible _and_ abstract.
30. sexy_seedbox ◴[] No.44012738{3}[source]
Roo Code + Roo Commander + Openrouter (connecting Gemini with Vertex AI) + Context7
31. DonHopkins ◴[] No.44012834{4}[source]
You sound like a Canadian LLM!
replies(1): >>44015299 #
32. DonHopkins ◴[] No.44012847[source]
Just install Cursor, it supports Gemini and many other LLMs right out of the box.
replies(1): >>44013559 #
33. hfgjbcgjbvg ◴[] No.44013270{4}[source]
Real.
34. paganel ◴[] No.44013284{3}[source]
> It is absolutely the greatest golden age in programming ever

It depends, because you now have to pay in order to be able to compete against other programmers who're also using AI tools, it wasn't like that in what I'd call the true "golden age", basically the '90s - early part of the 2000s, when the internet was already a thing and one could put together something very cool with just a "basic" text editor.

replies(1): >>44016089 #
35. Gud ◴[] No.44013288[source]
No, I don’t.
replies(1): >>44013356 #
36. DHolzer ◴[] No.44013306[source]
without wanting to sound overly sceptical, what exactly makes you think it performs so much better compared to claude and chatgpt?

Is there any concrete example that makes it really obvious? I had no such success with it so far and i would really like to see the clear cut between the gemini and the others.

37. ionwake ◴[] No.44013356{3}[source]
lmao
38. yahoozoo ◴[] No.44013367[source]
Nice try, Mr. Google.

But seriously, yeah, Gemini is pretty great.

39. snthpy ◴[] No.44013373[source]
I also used it to "vibe write" a short story. I use it similarly to vibe coding, I give the theme and structure of the story along with the major sections and tensions and conflicts I want to express and then it filled in the words in my chosen style. I also created an editor persona and then we went back and forth between the editor and writer personas to refine the story.

The Omega Directive: https://snth.prose.sh/the_omega_directive

replies(1): >>44013489 #
40. reacharavindh ◴[] No.44013381[source]
I use Gemini2.5 Pro through work and it is excellent. However, I use Claude 3.7 Sonnet via API for personal use using money added to their account.

I couldn’t find a way to use Gemini like a prepaid plan. I ain’t giving my credit card to Google for an LLM that can easily charge me hundreds or thousands of EUR.

replies(2): >>44013778 #>>44015005 #
41. bossyTeacher ◴[] No.44013473[source]
> Today I'm not sure how I could continue programming without Google Gemini in my toolkit

Anyone else concerned about this kind of statements? Make no mistake, everyone. We are living in a LLM bubble (not an AI bubble as none of these companies are actually interested in AI as such as moving towards AGI). They are all trying to commercialise LLMs with some minor tweaks. I don't expect LLMs to make the kind of progress made by the first 3 iterations of GPT. And when the insanely hyped overvaluations crashed, the bubble WILL crash. You BETTER hope there is any money left to run this kind of tools at a profit or you will be back at Stackoverflow trying to relearn all the skills you lost using generative coding tools.

42. CuriouslyC ◴[] No.44013489{3}[source]
My writing process is a bit different from my coding process with AI, it's more of an iterative refinement process.

I tend to form the story arc in my head, and outline the major events in a timeline, and create very short summaries of important scenes, then use AI to turn those summaries into rough narrative outlines by asking me questions and then using my answers to fill in the details.

Next I'll feed that abbreviated manuscript into AI and brainstorm as to what's missing/where the flow could use improvement/etc with no consideration for prose quality, and start filling in gaps with new scenes until I feel like I have a compelling rough outline.

Then I just plow from beginning to end rewriting each chapter, first with AI to do a "beta" draft, then I rewrite significant chunks by hand to make things really sharp.

After this is done I'll feed the manuscript back into AI and get it to beta read given my target audience profile and ambitions for the book, and ask it to provide me feedback on how I can improve the book. Then I start editing based on this, occasionally adding/deleting scenes or overhauling ones that don't quite work based on a combination of my and AI's estimation. When Gemini starts telling me it can't think of much to improve the manuscript that's when it's time for human beta readers.

replies(2): >>44013585 #>>44013865 #
43. johnisgood ◴[] No.44013559{3}[source]
Unfortunately I cannot use Cursor, not until they fix https://github.com/getcursor/cursor/issues/598.

What about Zed or something else?

I have not used any IDEs like Cursor or Zed, so I am not sure what I should be using (on Linux). I typically just get on Claude (claude.ai) or ChatGPT and do everything manually. It has worked fine for me so far, but if there is a way to reduce friction, I am willing to give it a try. I do not really need anything advanced, however. I just want to feed it the whole codebase (at times), some documentation, and then provide prompts. I mostly care about support for Claude and perhaps Gemini (would like to try it out).

44. conartist6 ◴[] No.44013576[source]
You don't worry that you can't think anymore without paying google to think for you?
replies(1): >>44013727 #
45. snthpy ◴[] No.44013585{4}[source]
Thank you for sharing that. I'm going to try that up to "then I rewrite significant chunks by hand to make things really sharp". I'm not a writer a would have never dreamed of writing anything until I gave this a try. I've often had ideas for stories though and using Gemini to bring these to "paper" has felt like a superpower similar how it must feel for people who can't code but now can able to create apps thanks to AI. I think it's a really exciting time!

I've been wondering about what the legalities of the generated content are though since we know that a lot of the artistic source content was used without consent?C an I put the stories on my blog? Or, not that I wanted to, publish them? I guess people use AI generated code everywhere so I guess for practical purposes the cat is out the bag and won't be put back in again.

replies(1): >>44013648 #
46. conartist6 ◴[] No.44013588{3}[source]
Wow yeah I'm old enough to remember when the focus wasn't on the programmers, but on the people the programs were written for.

We used to serve others, but now people are so excited about serving themselves first that there's almost no talk of service to others at all anymore

47. conartist6 ◴[] No.44013594{4}[source]
The investors know it. They're not competing to own this shit like it's gonna stay free.
48. CuriouslyC ◴[] No.44013648{5}[source]
If you've put manual work into curating and assembling AI output, you have copyright. It's only not copyrightable if you had the AI one shot something.
49. Eezee ◴[] No.44013719[source]
I tried it out because of your comment and the very first prompt Gemini 2.5 Pro hallucinated a non-existant plugin including detailed usage instructions.

Not really my idea of good.

replies(3): >>44014915 #>>44016065 #>>44016879 #
50. conartist6 ◴[] No.44013727[source]
OK, a better scenario than that: for some reason they cut you off. They're a huge company, they don't really care, and you would have no recourse. Many people live this story. Where once you were a programmer, if Google convinces you to eliminate your self-reliance they can then remotely turn off you being a programmer. There are other people who will use those GPU cycles to be programmers! Google will still make money.
51. christophilus ◴[] No.44013729[source]
The comments drive me nuts.

// Moved to foo.ts

Ok, great. That’s what git is for.

// Loop over the users array

Ya. I can read code at a CS101 level, thanks.

52. b0ringdeveloper ◴[] No.44013778[source]
Try OpenRouter. Load up with $20 of credits and use their API for a variety of models across providers, including Gemini. I think you pay ~5% extra for the OpenRouter service.
replies(1): >>44014629 #
53. kdmtctl ◴[] No.44013858[source]
Copilot has it in preview. I found it looks deeper on devops tasks in the Agent mode. But context matters, you should include everything and it will push. Now I switch between Cloude and Gemini when one of them starts going circles. Gemini certainly could have more context but Copilot clearly limits it. Didn't try with Studio key though, only default settings.
54. priceofmemory ◴[] No.44013865{4}[source]
That sounds very similar to my AI vibe writing process. Start with chapter outlines, then ask AI to fill in the details for each scene. Then ask AI to point out any plot holes or areas for improvement in the chapter (with relation to other chapters). Then go through chapter by chapter for a second rewrite doing the same thing. At ~100k words for a fan-fiction novel but expect to be at about 120k words after this latest rewrite.

https://frypatch.github.io/The-Price-of-Remembering/

55. the_arun ◴[] No.44013871[source]
Are you talking about Firebase Studio?
56. theropost ◴[] No.44013881{3}[source]
I wish my AI would tell me when I'm going in the wrong direction, instead of just placating my stupid request over and over until I realize.. even though it probably could have suggested a smarter direction, but instead just told me "Great idea! "
replies(4): >>44014908 #>>44014910 #>>44016378 #>>44018328 #
57. mayas_ ◴[] No.44013899[source]
I guess it depends on the type of tasks you give it.

They all seem to work remarkably well writing typescript or python but in my experience, they fall short when it comes to shell and more broadly dev ops

replies(1): >>44018081 #
58. CommenterPerson ◴[] No.44014263[source]
Sorry sounds like a marketing plug.
59. belter ◴[] No.44014585[source]
> Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it

Every time in the last three or four weeks, there is a post here about Gemini, the top comment, or one of the top comments is something along these lines. And every time I spend a few minutes making empirical tests to check if I made a mistake in cancelling my paid Gemini account after giving up on it...

So I just did a couple of tests sending the same prompt on some AWS related questions to Gemini Pro 2.5 (free) and Claude paid, and no, Claude still better.

replies(1): >>44015030 #
60. belter ◴[] No.44014629{3}[source]
Do you work for OpenRouter?
61. oplorpe ◴[] No.44014770[source]
I’ve yet to see any llm proselytizers acknowledge this glaring fact:

Each new release is “game changing”.

The implication being the last release y’all said was “game changing” is now “from a different century”.

Do you see it?

For this to be an accurate and true assessment means you were wrong both before and wrong now.

replies(3): >>44015458 #>>44015553 #>>44017817 #
62. energy123 ◴[] No.44014836[source]
Can you explain what you mean by "uncapping" temperature and "samplers"? You can currently set temperature to whatever you want. Or do you want > 2 temp.
63. Workaccount2 ◴[] No.44014908{4}[source]
I don't know if you have used 2.5, but it is the first model to disagree with directions I have provided...

"..the user suggests using XYZ to move forward, but that would be rather inefficient, perhaps the user is not totally aware of the characteristics of XYZ. We should suggest moving forward with ABC and explain why it is the better choice..."

replies(1): >>44015360 #
64. ◴[] No.44014910{4}[source]
65. gundmc ◴[] No.44014915[source]
Can you provide your prompt? This hasn't matched my experience. You can also try enabling search grounding in the right hand bar. You have to also explicitly tell it in your prompt to use grounding with Google Search, but I've had very good success with that even for recent or niche plugins/libraries.
replies(1): >>44015566 #
66. ◴[] No.44014917[source]
67. MrDarcy ◴[] No.44014928[source]
I’ve felt the same, but what is the equivalent of Claude code in Google’s ecosystem?

I want something running in a VM I can safely let all tools execute without human confirmation and I want to write my own tools and plug them in.

Right now a pro max subscription with Claude code plus my MCP servers seems to be the sweet spot, and a cursory look at the Google ecosystem didn’t identify anything like it. Am I overlooking something?

replies(1): >>44015243 #
68. Workaccount2 ◴[] No.44014975{4}[source]
The cost to google lying about data privacy far exceeds the profit gained from using it. Alienate your most valuable customers (enterprise) so you can get 10% more training data? And almost certainly end up in a sea of lawsuits from them?

Not happening. Investors would riot.

replies(1): >>44015451 #
69. ◴[] No.44015005[source]
70. Workaccount2 ◴[] No.44015030[source]
Can you share the prompts?
71. thinkxl ◴[] No.44015243[source]
I think using Aider[1] with Google's models is the closest.

It's my daily driver so far. I switch between the Claude and Gemini models depending on the type of work I'm doing. When I know exactly what I want, I use Claude. When I'm experimenting and discovering, I use Gemini.

[1]: https://aider.chat/docs/llms/gemini.html

72. smoyer ◴[] No.44015299{5}[source]
Eh?
73. Barbing ◴[] No.44015307[source]
If a user eventually creates half a dozen projects with an API key for each, and prompts Gemini side-by-side under each key, and only some of the responses are consistently terrible…

Would you expect that to be Google employing cost-saving measures?

74. redog ◴[] No.44015360{5}[source]
It really gave me a lot of push back once when I wanted to use a js library over a python one for a particular project. Like I gave it my demo code in js and it basically said, "meh, cute but use this python one because ...reasons..."
replies(1): >>44016396 #
75. ayrtondesozzla ◴[] No.44015451{5}[source]
Indeed, the first stage of the enshittification process requires mollycoddling the customer in a convincing manner.

Looking forward to stage 2 - start serving the advertisers while placating the users, and finally stage 3 - offering it all up to the investors while playing the advertisers off each other and continuing to placate the users.

76. ayrtondesozzla ◴[] No.44015458[source]
I'm not an LLM proselytiser but this makes no sense? It would almost make sense if someone were claiming there are only two possible games, the old one and the new one, and never any more. Who claims that?
replies(2): >>44015724 #>>44017022 #
77. squidbeak ◴[] No.44015553[source]
I'm unsure I fully understand your contention.

Are you suggesting that a rush to hyperbole which you don't like means advances in a technology aren't groundbreaking?

Or is it that if there is more than one impressive advance in a technology, any advance before the latest wasn't worthy of admiration at the time?

replies(1): >>44015803 #
78. th0ma5 ◴[] No.44015566{3}[source]
So glad we're pinning the success and learning of new technology on random anecdotes. Do pro AI people not see how untenable it is where everything is a rumor?
replies(1): >>44015936 #
79. roflyear ◴[] No.44015575{5}[source]
Or, just let their users deal with the bugs b/c churn will be less than the cost of developers.
replies(1): >>44018617 #
80. oplorpe ◴[] No.44015724{3}[source]
I suppose my point is along these lines.

When gpt3 was trained, its parent company refused to release the weights claiming it was a “clear and present danger to civilization”. Now GPT3 is considered a discardable toy.

So either these things are going toward an inflection point of usefulness or this release too will be, in time, mocked as a discardable toy too.

So why every 3 days do we get massive threads with people fawning over the new fashion like this singular technology that is developing is ackshually finally become the fire stolen from the gods.

replies(1): >>44017842 #
81. oplorpe ◴[] No.44015803{3}[source]
Yes, it’s the hyperbole.

This is, at best, an incremental development of an existing technology.

Though even that is debatable considering the wildly differing opinions in this thread in regards to this vs other models.

replies(1): >>44020009 #
82. cooperaustinj ◴[] No.44015936{4}[source]
It is only a rumor to people who refuse to put in effort.
replies(2): >>44017452 #>>44017994 #
83. patrick451 ◴[] No.44016065[source]
This has consistently been my experience with every LLM I have tried. Everybody says "Oh, you tried it one the model from two months ago? Doesn't count, the new ones are sooo much better". So I try the new one and it still hallucinates.
84. danieldk ◴[] No.44016089{4}[source]
One could put something cool together without internet using Delphi. The Borland IDEs were ahead of their time - built-in debugger, profiler, and pretty good documentation. My 'internet' was the SWAG Pascal snippet collection (which could be used fully offline). Someone converted it to HTML:

http://www.retroarchive.org/swag/index.html

85. rad_gruchalski ◴[] No.44016378{4}[source]
You must be confusing „intelligence” with „statistically most probable next word”.
86. rad_gruchalski ◴[] No.44016396{6}[source]
Wow, you can now pay to have „engineers” being overruled by artificial „intelligence”? People who have no idea are now going to be corrected by an LLM which has no idea by design. Look, even if it gets a lot of things right it’s still trickery.

I get popcorn and wait for more work coming my way 5 years down the road. Someone will have tidy this mess up and gen-covid will have lost all ability to think on their own by then.

87. Jnr ◴[] No.44016879[source]
Same thing, I thought I would give it a shot and it got the first solution so wrong in a simple nextjs typescript project I laughed out. It was fast but incorrect.
88. trympet ◴[] No.44017022{3}[source]
Parent is implying that we're still playing the same game.
replies(1): >>44017772 #
89. mostlysimilar ◴[] No.44017452{5}[source]
I'd rather put my effort into developing my own skills, not hand-holding a hallucinating robot.
replies(1): >>44018020 #
90. ludwik ◴[] No.44017772{4}[source]
Is "game-changing" supposed to imply changing the game to a completely different one? Like, is the metaphor that we were playing soccer, and then we switched to paintball or basketball or something? I always understood it to mean a big change within the same game - like we’re still playing soccer, but because of a goal or a shift, the team that was on defense now has to go on offense...
91. raincole ◴[] No.44017817[source]
It is just how fast this field advances compared to all the other things we've seen before. Human language doesn't have better words to describe this unusual phenomenon, so we resort to "game-changing".
92. ayrtondesozzla ◴[] No.44017842{4}[source]
Well essentially then, I agree, I find it perplexing too.

I got particularly burned by that press release a little before Christmas, where it was claimed that 4o was doing difficult maths and programming stuff. A friend told me about it very excitedly, I imagined they were talking about something that had really happened.

A few days later when I'd time to look into it, it turned out that essentially we'd internal testing and press releases to go off, I couldn't believe it. I said - so, marketing. A few months later it was revealed that a lot of the claimed results in those world-changing benchmarks were due to answers that had been leaked, etc etc. The usual hype theatre, in the end

93. th0ma5 ◴[] No.44017994{5}[source]
What does this even mean? Lol.
94. -__---____-ZXyw ◴[] No.44018020{6}[source]
I enjoyed the clarity of that sentence. It's wild to read. Some people are choosing the hand-holding of the hallucinating robot instead of developing their skills, and simultaneously training their replacement (or so the bosses hope, anyway).

I wonder if "robot" was being used here in its original sense too of a "forced worker" rather than the more modern sense of "mechanical person". If not, I propose it.

95. ◴[] No.44018081[source]
96. stirfish ◴[] No.44018328{4}[source]
One trick I found is to tell the llm that an llm wrote the code, whether it did or not. The machine doesn't want to hurt your feelings, but loves to tear apart code it thinks it might've wrote.
replies(1): >>44018387 #
97. teleforce ◴[] No.44018375[source]
There's a complex Numpy indexing codes riddle in the section of "I don’t like NumPy indexing" and Gemini Pro 2.5 came on top (DeepSeek R1, only get the first time right but not later) [1],[2].

> For fun, I tried asking a bunch of AI models to figure out what shapes those arrays have. Here were the results:

Based on the results from the top 8 state-of-the- art AI models, Gemini is the best and consistently got the right results:

[1] I don't like NumPy (204 comments):

https://news.ycombinator.com/item?id=43996431

[2] I don't like NumPy: I don’t like NumPy indexing:

https://dynomight.net/numpy/

98. jghn ◴[] No.44018387{5}[source]
I like just responding with "are you sure?" continuously. at some point you'll find it gets stuck in a local minima/maxima, and start oscillating. Then I backtrack and look at where it wound up before that. Then I take that solution and go to a fresh session.
99. scottmf ◴[] No.44018617{6}[source]
Right. Look at Electron apps. They're ubiquitous despite the poorer performance and user experience because the benefits outweigh the negatives.

Maintaining a codebase isn't going to be a thing in the future, at least not in the traditional/current sense.

100. 59nadir ◴[] No.44020009{4}[source]
About 1.5-2 years ago I was using GitHub Copilot to write code, mostly as a boilerplate completer, really, because eventually I realized I spent too much time reading the suggestions and/or fixing the end result when I should've just written it completely myself. I did try it out with a pretty wide scope, i.e. letting it do more or less and seeing what happened. All in all it was pretty cool, I definitely felt like there were some magic moments where it seems to put everything together and sort of read my mind.

Anyway, that period ended and I went until a few months ago without touching anything like this and I was hearing all these amazing things about using Cursor with Claude Sonnet 3.5, so I decided to try it out with a few use cases:

1. Have it write a tokenizer and parser from scratch for a made up Clojure-like language

2. Have it write the parser for the language given the tokenizer I had already written previously

3. Have it write only single parsing functions for very specific things with both the tokenizer and parsing code to look at to see how it works

#1 was a complete and utter failure, it couldn't even put together a simple tokenizer even if shown all of the relevant parts of the host language that would enable a reasonable tokenizer end result.

#2 was only slightly better, but the end results were nowhere near usable, and even after iteration it couldn't produce a runnable result.

#3 is the first one my previous experience with Copilot suggested to me should be doable. We started out pretty badly, it misunderstood one of the tokenizer functions it had examples for and used it in a way that doesn't really make sense given the example. After that it also wanted to add functions it had already added for some reason. I ran into myriad issues with just getting it to either correct, move on or do something productive until I just called it quits.

My personal conclusion from all of this is that yes, it's all incredibly incremental, any kind of "coding companion" or agent has basically the same failure modes/vectors they had years ago and much of that hasn't improved all that much.

The odds that I could do my regular work on 3D engines with the coding companions out there are slim to none when it can't even put together something as simple as a tokenizer together, or use an already existing one to write some simple tokenizer functions. For reference I know that it took my colleague who has never written either of those things 30 minutes until he productively and correctly used exactly the same libraries the LLM was given.