AI 2027

(ai-2027.com)

949 points Tenoke | 1 comments | 03 Apr 25 16:13 UTC | HN request time: 0.257s | source

Show context

amarcheschi ◴[03 Apr 25 17:15 UTC] No.43572698[source]▶

I just spent some time trying to make claude and gemini make a violin plot of some polar dataframe. I've never used it and it's just for prototyping so i just went "apply a log to the values and make a violin plot of this polars dataframe". ANd had to iterate with them for 4/5 times each. Gemini got it right but then used deprecated methods

I might be doing llm wrong, but i just can't get how people might actually do something not trivial just by vibe coding. And it's not like i'm an old fart either, i'm a university student

replies(5): >>43572975 #>>43573061 #>>43573105 #>>43573520 #>>43576238 #

hiq ◴[03 Apr 25 18:26 UTC] No.43573520[source]▶

>>43572698 #

> had to iterate with them for 4/5 times each. Gemini got it right but then used deprecated methods

How hard would it be to automate these iterations?

How hard would it be to automatically check and improve the code to avoid deprecated methods?

I agree that most products are still underwhelming, but that doesn't mean that the underlying tech is not already enough to deliver better LLM-based products. Lately I've been using LLMs more and more to get started with writing tests on components I'm not familiar with, it really helps.

replies(3): >>43575091 #>>43575868 #>>43576566 #

jaccola ◴[03 Apr 25 21:47 UTC] No.43575868[source]▶

>>43573520 #

How hard can it be to create a universal "correctness" checker? Pretty damn hard!

Our notion of "correct" for most things is basically derived from a very long training run on reality with the loss function being for how long a gene propagated.

replies(1): >>43603660 #

1. hiq ◴[06 Apr 25 18:27 UTC] No.43603660[source]▶

>>43575868 #

You don't need a full correctness checker to get a useful product though. New code generated by the current generation of LLMs, which also compiles and passes existing tests, is likely to be somewhat useful in my experience. The problem is that we still get too much code that doesn't pass these basic requirements.

↑