Most active commenters
  • Insanity(3)

←back to thread

181 points thunderbong | 11 comments | | HN request time: 0.001s | source | bottom
Show context
mycentstoo ◴[] No.45083181[source]
I believe choosing a well known problem space in a well known language certainly influenced a lot of the behavior. AIs usefulness is correlated strongly with its training data and there’s no doubt been a significant amount of data about both the problem space and Python.

I’d love to see how this compares when either the problem space is different or the language/ecosystem is different.

It was a great read regardless!

replies(5): >>45083320 #>>45085533 #>>45086752 #>>45087639 #>>45092126 #
1. Insanity ◴[] No.45083320[source]
100% this. I tried haskelling with LLMs and it’s performance is worse compared to Go.

Although in fairness this was a year ago on GPT 3.5 IIRC

replies(6): >>45083408 #>>45083590 #>>45083706 #>>45085045 #>>45085275 #>>45085640 #
2. danielbln ◴[] No.45083408[source]
Post-training in all frontier models has improved significantly wrt to programming language support. Take Elexir, which LLMs could barely handle a test ago, but now support has gotten really good
3. diggan ◴[] No.45083590[source]
> Although in fairness this was a year ago on GPT 3.5 IIRC

GPT3.5 was impressive at the time, but today's SOTA (like GPT 5 Pro) are almost night-and-difference both in terms of just producing better code for wider range of languages (I mostly do Rust and Clojure, handles those fine now, was awful with 3.5) and more importantly, in terms of following your instructions in user/system prompts, so it's easier to get higher quality code from it now, as long as you can put into words what "higher quality code" means for you.

4. r_lee ◴[] No.45083706[source]
I'm not sure I'd say "100% this" if I was talking about GPT 3.5...
replies(2): >>45084580 #>>45085309 #
5. verelo ◴[] No.45084580[source]
Yeah, 3.5 was good when it came out but frankly anyone reviewing AI for coding not using sonnet 4.1, GPT-5 or equivalent is really not aware of what they've missed out on.
6. johnisgood ◴[] No.45085045[source]
I wrote some Haskell using Claude. It was great.
7. ocharles ◴[] No.45085275[source]
I write Haskell with Claude Code and it's got remarkably good recently. We have some code at work that uses STM to have what is essentially a mutable state machine. I needed to split a state transition apart, and it did an admirable job. I had to intervene once or twice when it was going down a valid, but undesirable approach. This almost one shot performance was already a productivity boost, but didn't quite build. What I find most impressive now is the "fix" here is to literally have Claude run the build and see the errors. While GHC errors are verbose and not always the best it got everything building in a few more iterations. When it later got a test failure, I suggested we add a bit more logging - so it logged all state transitions, and spotted the unexpected transition and got the test passing. We really are a LONG way away from 3.5 performance.
8. Insanity ◴[] No.45085309[source]
Yah, that’s a fair point. I had assumed it’d remain relatively similar given that the training data would be smaller for languages like Haskell versus languages like Python & JavaScript.
9. computerex ◴[] No.45085640[source]
3.5 was a joke in coding compared to sonnet 4.
replies(2): >>45086680 #>>45087723 #
10. Insanity ◴[] No.45086680[source]
Yup fair point, it’s been some time. Although vibe coding is more “miss” than “hit” for me.
11. pizza ◴[] No.45087723[source]
It's so thrilling that this is actually true in just a year