←back to thread

467 points 0x63_Problems | 4 comments | | HN request time: 0.62s | source
Show context
perrygeo ◴[] No.42138092[source]
> Companies with relatively young, high-quality codebases benefit the most from generative AI tools, while companies with gnarly, legacy codebases will struggle to adopt them. In other words, the penalty for having a ‘high-debt’ codebase is now larger than ever.

This mirrors my experience using LLMs on personal projects. They can provide good advice only to the extent that your project stays within the bounds of well-known patterns. As soon as your codebase gets a little bit "weird" (ie trying to do anything novel and interesting), the model chokes, starts hallucinating, and makes your job considerably harder.

Put another way, LLMs make the easy stuff easier, but royally screws up the hard stuff. The gap does appear to be widening, not shrinking. They work best where we need them the least.

replies(24): >>42138267 #>>42138350 #>>42138403 #>>42138537 #>>42138558 #>>42138582 #>>42138674 #>>42138683 #>>42138690 #>>42138884 #>>42139109 #>>42139189 #>>42140096 #>>42140476 #>>42140626 #>>42140809 #>>42140878 #>>42141658 #>>42141716 #>>42142239 #>>42142373 #>>42143688 #>>42143791 #>>42151146 #
cheald ◴[] No.42139109[source]
The niche I've found for LLMs is for implementing individual functions and unit tests. I'll define an interface and a return (or a test name and expectation) and say "this is what I want this to do", and let the LLM take the first crack at it. Limiting the bounds of the problem to be solved does a pretty good job of at least scaffolding something out that I can then take to completion. I almost never end up taking the LLM's autocompletion at face value, but having it written out to review and tweak does save substantial amounts of time.

The other use case is targeted code review/improvement. "Suggest how I could improve this" fills a niche which is currently filled by linters, but can be more flexible and robust. It has its place.

The fundamental problem with LLMs is that they follow patterns, rather than doing any actual reasoning. This is essentially the observation made by the article; AI coding tools do a great job of following examples, but their usefulness is limited to the degree to which the problem to be solved maps to a followable example.

replies(3): >>42140322 #>>42143531 #>>42143847 #
1. nox101 ◴[] No.42143847[source]
Can you give some examples? What LLM? What code? What tests?

As a test I just asked "ChatGPT 4o with canvas" to "Can you write a set of tests to test glBufferData and all of its edge cases?"

glBufferData is a 32 year old API so there's clearly plenty of examples for to have looked it. There are even multiple public tests for it including the official tests that are open sources and so easily scannable. It failed

It wrote 8 tests, 7 of those tests were wrong in that it did something wrong intentionally then asserted it go no error. It wasn't close to comprehensive. It didn't test the function actually put data in the buffer for example, nor did it check the set of valid enums to see that they work. Nor did it check that the target parameter actually works and affects the correct buffer bound to that target.

This is my experience with LLMs for code so far. I do get answers quicker from LLMs sometimes for tech questions vs searching via Google and reading stack overflow. But that's only sometimes. As a recent example, I was trying to add TypeScript types some JavaScript and it failed. I went round and round tell it it failed but it got stuck in a loop and just kept saying "Oh, sorry. How about this -- repeat of previous code"

replies(2): >>42144893 #>>42145945 #
2. wruza ◴[] No.42144893[source]
Wait, wait. You ought to write tests for javascript react html form validation boilerplate. Not that.

/s aside, it’s what we all experience too. There’s a great divide between programming pre-around-2015 and thereafter. LLMs can only do recent programming, which is a product of tons of money getting loaded into the industry and creating jobs that made no sense ten years ago. Basically, the more repetitive boilerplate patterns configuration options import blocks row-obj-dto-obj conversion typecheck bullshit you write per day, the more LLMs help. I mean, one could abstract all that away using regular programming, but how would they sell their work for $^6 an AI for $^9 then?

Just yesterday, after reading yet another “oh you must try again” comment, I asked 4o about how to stop puppeteer from dumping errors into console and exit gracefully when I close the headful browser (all logs and code provided). Right away it slided into nonsense. I always finish my chats with what I think about it uncut, just in case someone uses these for further learning.

replies(1): >>42231599 #
3. Aeolun ◴[] No.42145945[source]
If you asked me to write tests with such a vague definition I’d also have issues writing them though. It’ll work a lot better if you tell it what you want it to validate I think.
4. ◴[] No.42231599[source]