←back to thread

466 points 0x63_Problems | 2 comments | | HN request time: 0.475s | source
Show context
perrygeo ◴[] No.42138092[source]
> Companies with relatively young, high-quality codebases benefit the most from generative AI tools, while companies with gnarly, legacy codebases will struggle to adopt them. In other words, the penalty for having a ‘high-debt’ codebase is now larger than ever.

This mirrors my experience using LLMs on personal projects. They can provide good advice only to the extent that your project stays within the bounds of well-known patterns. As soon as your codebase gets a little bit "weird" (ie trying to do anything novel and interesting), the model chokes, starts hallucinating, and makes your job considerably harder.

Put another way, LLMs make the easy stuff easier, but royally screws up the hard stuff. The gap does appear to be widening, not shrinking. They work best where we need them the least.

replies(24): >>42138267 #>>42138350 #>>42138403 #>>42138537 #>>42138558 #>>42138582 #>>42138674 #>>42138683 #>>42138690 #>>42138884 #>>42139109 #>>42139189 #>>42140096 #>>42140476 #>>42140626 #>>42140809 #>>42140878 #>>42141658 #>>42141716 #>>42142239 #>>42142373 #>>42143688 #>>42143791 #>>42151146 #
cheald ◴[] No.42139109[source]
The niche I've found for LLMs is for implementing individual functions and unit tests. I'll define an interface and a return (or a test name and expectation) and say "this is what I want this to do", and let the LLM take the first crack at it. Limiting the bounds of the problem to be solved does a pretty good job of at least scaffolding something out that I can then take to completion. I almost never end up taking the LLM's autocompletion at face value, but having it written out to review and tweak does save substantial amounts of time.

The other use case is targeted code review/improvement. "Suggest how I could improve this" fills a niche which is currently filled by linters, but can be more flexible and robust. It has its place.

The fundamental problem with LLMs is that they follow patterns, rather than doing any actual reasoning. This is essentially the observation made by the article; AI coding tools do a great job of following examples, but their usefulness is limited to the degree to which the problem to be solved maps to a followable example.

replies(3): >>42140322 #>>42143531 #>>42143847 #
MarcelOlsz ◴[] No.42140322[source]
Can't tell you how much I love it for testing, it's basically the only thing I use it for. I now have a test suite that can rebuild my entire app from the ground up locally, and works in the cloud as well. It's a huge motivator actually to write a piece of code with the reward being the ability to send it to the LLM to create some tests and then seeing a nice stream of green checkmarks.
replies(3): >>42140464 #>>42140879 #>>42143641 #
1. highfrequency ◴[] No.42140879[source]
> I now have a test suite that can rebuild my entire app from the ground up

What does this mean?

replies(1): >>42141059 #
2. MarcelOlsz ◴[] No.42141059[source]
Sorry, should have been more clear. Firebase is (or was) a PITA when I started the app I'm working on a few years ago. I have a lot of records in my db that I need to validate after normalizing the data. I used to have an admin page that spit out a bunch of json data with some basic filtering and self-rolled testing that I could verify at a glance.

After a few years off from this project, I refactored it all, and part of that refactoring was building a test suite that I can run. When ran, it will rebuild, normalize, and verify all the data in my app (scraped data).

When I deploy, it will also run these tests and then email if something breaks, but skip the seeding portion.

I had plans to do this before but the firebase emulator still had a lot of issues a few years ago, and refactoring this project gave me the freedom to finally build a proper testing environment and make my entire app make full use of my local firebase emulator without issue.

I like giving it my test cases in plain english. It still gets them wrong sometimes but 90% of the time they are good to go.