Most active commenters

MarcelOlsz(12)
sarchertech(6)
PaulHoule(3)

AI makes tech debt more expensive

(www.gauge.sh)

Show context

perrygeo ◴[14 Nov 24 16:50 UTC] No.42138092[source]▶

> Companies with relatively young, high-quality codebases benefit the most from generative AI tools, while companies with gnarly, legacy codebases will struggle to adopt them. In other words, the penalty for having a ‘high-debt’ codebase is now larger than ever.

This mirrors my experience using LLMs on personal projects. They can provide good advice only to the extent that your project stays within the bounds of well-known patterns. As soon as your codebase gets a little bit "weird" (ie trying to do anything novel and interesting), the model chokes, starts hallucinating, and makes your job considerably harder.

Put another way, LLMs make the easy stuff easier, but royally screws up the hard stuff. The gap does appear to be widening, not shrinking. They work best where we need them the least.

replies(24): >>42138267 #>>42138350 #>>42138403 #>>42138537 #>>42138558 #>>42138582 #>>42138674 #>>42138683 #>>42138690 #>>42138884 #>>42139109 #>>42139189 #>>42140096 #>>42140476 #>>42140626 #>>42140809 #>>42140878 #>>42141658 #>>42141716 #>>42142239 #>>42142373 #>>42143688 #>>42143791 #>>42151146 #

cheald ◴[14 Nov 24 18:03 UTC] No.42139109[source]▶

>>42138092 #

The niche I've found for LLMs is for implementing individual functions and unit tests. I'll define an interface and a return (or a test name and expectation) and say "this is what I want this to do", and let the LLM take the first crack at it. Limiting the bounds of the problem to be solved does a pretty good job of at least scaffolding something out that I can then take to completion. I almost never end up taking the LLM's autocompletion at face value, but having it written out to review and tweak does save substantial amounts of time.

The other use case is targeted code review/improvement. "Suggest how I could improve this" fills a niche which is currently filled by linters, but can be more flexible and robust. It has its place.

The fundamental problem with LLMs is that they follow patterns, rather than doing any actual reasoning. This is essentially the observation made by the article; AI coding tools do a great job of following examples, but their usefulness is limited to the degree to which the problem to be solved maps to a followable example.

replies(3): >>42140322 #>>42143531 #>>42143847 #

1. MarcelOlsz ◴[14 Nov 24 19:43 UTC] No.42140322[source]▶

>>42139109 #

Can't tell you how much I love it for testing, it's basically the only thing I use it for. I now have a test suite that can rebuild my entire app from the ground up locally, and works in the cloud as well. It's a huge motivator actually to write a piece of code with the reward being the ability to send it to the LLM to create some tests and then seeing a nice stream of green checkmarks.

replies(3): >>42140464 #>>42140879 #>>42143641 #

2. highfrequency ◴[14 Nov 24 20:32 UTC] No.42140879[source]▶

>>42140322 (TP) #

> I now have a test suite that can rebuild my entire app from the ground up

What does this mean?

replies(1): >>42141059 #

3. PaulHoule ◴[14 Nov 24 20:37 UTC] No.42140932[source]▶

>>42140464 #

I had Codeium add something to a function that added a new data value to an object. Unbidden it wrote three new tests, good tests. I wrote my own test by cutting and pasting a test it wrote with a modification, it pointed out that I didn’t edit the comment so I told it to do so.

It also screwed up the imports of my tests pretty bad, some imports that worked before got changed for no good reason. It replaced the JetBrains NotNull annotation with a totally different annotation.

It was able to figure out how to update a DAO object when I added a new field. It got the type of the field wrong when updating the object corresponding to a row in that database column even though it wrote the liquibase migration and should have known the type —- we had chatted plenty about that migration.

It got many things right but I had to fix a lot of mistakes. It is not clear that it really saves time.

replies(2): >>42141053 #>>42141069 #

4. imp0cat ◴[14 Nov 24 20:48 UTC] No.42141053{3}[source]▶

>>42140932 #

Let's be clear here, Codeium kinda sucks. Yeah, it's free and it works, somewhat. But I wouldn't trust it much.

5. MarcelOlsz ◴[14 Nov 24 20:48 UTC] No.42141059[source]▶

>>42140879 #

Sorry, should have been more clear. Firebase is (or was) a PITA when I started the app I'm working on a few years ago. I have a lot of records in my db that I need to validate after normalizing the data. I used to have an admin page that spit out a bunch of json data with some basic filtering and self-rolled testing that I could verify at a glance.

After a few years off from this project, I refactored it all, and part of that refactoring was building a test suite that I can run. When ran, it will rebuild, normalize, and verify all the data in my app (scraped data).

When I deploy, it will also run these tests and then email if something breaks, but skip the seeding portion.

I had plans to do this before but the firebase emulator still had a lot of issues a few years ago, and refactoring this project gave me the freedom to finally build a proper testing environment and make my entire app make full use of my local firebase emulator without issue.

I like giving it my test cases in plain english. It still gets them wrong sometimes but 90% of the time they are good to go.

6. MarcelOlsz ◴[14 Nov 24 20:50 UTC] No.42141069{3}[source]▶

>>42140932 #

Try using Cursor with the latest claude-3-5-sonnet-20241022.

replies(2): >>42141154 #>>42148341 #

7. MarcelOlsz ◴[14 Nov 24 20:53 UTC] No.42141096[source]▶

>>42140464 #

It's containerized and I have a script that takes care of everything from the ground up :) I've tested this on multiple OS' and friends computers. I'm thankful to old me for writing a readme for current me lol.

>Please tell me this is satire.

No. I started doing TDD. It's fun to think about a piece of functionality, write out some tests, and then slowly make it pass. Removes a lot of cognitive load for me and serves as a micro todo. It's also nice that when you're working and think of something to add, you can just write out a quick test for it and add it to kanban later.

I can't tell you how many times I've worked on projects that are gigantic in complexity and don't do any testing, or use typescript, or both. You're always like 25% paranoid about everything you do and it's the worst.

replies(1): >>42145900 #

8. PaulHoule ◴[14 Nov 24 21:00 UTC] No.42141154{4}[source]▶

>>42141069 #

Unfortunately I “think different” and use Windows. I use Microsoft Copilot and would say it is qualitatively similar to codeium in quality, a real quantitative eval would be a lot of work.

replies(1): >>42141165 #

9. MarcelOlsz ◴[14 Nov 24 21:01 UTC] No.42141165{5}[source]▶

>>42141154 #

Cursor (cursor.com) is just a vscode wrapper, should work fine with Windows. If you're already in the AI coding space I seriously urge you to at least give it a go.

replies(1): >>42141385 #

10. PaulHoule ◴[14 Nov 24 21:23 UTC] No.42141385{6}[source]▶

>>42141165 #

I'll look into it.

I'll add that my experience with the Codium plugin for IntelliJ is night and day different from the Windsurf editor from Codium.

The first one "just doesn't work" and struggles to see files that are in my project, the second basically works.

replies(1): >>42141405 #

11. MarcelOlsz ◴[14 Nov 24 21:27 UTC] No.42141405{7}[source]▶

>>42141385 #

You can also look into https://www.greptile.com/ to ask codebase questions. There's so many AI coding tools out there now. I've heard good things about https://codebuddy.ca/ as well (for IntelliJ) and https://www.continue.dev/ (also for IntelliJ).

>The first one "just doesn't work"

Haha. You're on a roll.

12. rr808 ◴[15 Nov 24 03:08 UTC] No.42143641[source]▶

>>42140322 (TP) #

I struggle to get github copilot to create any unit tests that provide any value. How to you get it to create really useful tests?

replies(2): >>42144840 #>>42144903 #

13. BillyTheKing ◴[15 Nov 24 08:08 UTC] No.42144840[source]▶

>>42143641 #

Would recommend to try out anthropic sonnet 3.5 for this one - usually generates decent unit tests for reasonably sized functions

14. MarcelOlsz ◴[15 Nov 24 08:22 UTC] No.42144903[source]▶

>>42143641 #

I use claude-3-5-sonnet-20241022 with a very explicit .cursorrules file with the cursor editor.

replies(1): >>42145845 #

15. ponector ◴[15 Nov 24 11:15 UTC] No.42145845{3}[source]▶

>>42144903 #

Can you share your .cursorrules? For me cursor is not much better than autocomplete, but I'm writing mostly e2e tests.

replies(1): >>42146031 #

16. sarchertech ◴[15 Nov 24 11:22 UTC] No.42145900{3}[source]▶

>>42141096 #

>It's a huge motivator actually to write a piece of code with the reward being the ability to send it to the LLM to create some tests and then seeing a nice stream of green checkmarks.

Yeah that’s not TDD.

replies(1): >>42146001 #

17. MarcelOlsz ◴[15 Nov 24 11:36 UTC] No.42146001{4}[source]▶

>>42145900 #

Don't you have a book to get to writing instead of leaving useless comments? Haha.

replies(1): >>42146183 #

18. MarcelOlsz ◴[15 Nov 24 11:42 UTC] No.42146031{4}[source]▶

>>42145845 #

You can find a bunch on https://cursor.directory/.

19. sarchertech ◴[15 Nov 24 12:10 UTC] No.42146183{5}[source]▶

>>42146001 #

More importantly I have a French cleat wall to finish, a Christmas present to make for my wife, and a toddler and infant to keep from killing themselves.

But I also have a day job and I can’t even begin to imagine how much extra work someone doing “TDD” by writing a function and then fixing it in place with a whole suite of generated tests would cause me.

I’m fine with TDD. I do it myself fairly often. I also go back in and delete the tests that I used to build it that aren’t actually going to be useful a year from now.

replies(1): >>42146498 #

20. MarcelOlsz ◴[15 Nov 24 12:59 UTC] No.42146498{6}[source]▶

>>42146183 #

Like I said above, I like the ability to scaffold tests using english and tweaking from there. I'm still not sure what point you're trying to make.

replies(1): >>42148139 #

21. sarchertech ◴[15 Nov 24 16:06 UTC] No.42148139{7}[source]▶

>>42146498 #

Your original point was that it was great to “write some code then send it to the LLM to create tests.”

That’s not test driven development.

replies(1): >>42151036 #

22. lubujackson ◴[15 Nov 24 16:24 UTC] No.42148341{4}[source]▶

>>42141069 #

Seconding Cursor. I have a friend who used Copilot 6 mo. ago and found it vaguely helpful... but turned him on to Cursor and it's a whole new ballgame.

Cross between actually useful autocomplete, personalized StackOverflow and error diagnosis (just paste and error message in chat). I know I am just scratching the usefulness and I pretty much never do changes across multiple files, but I definitely see firm net positives at this point.

23. MarcelOlsz ◴[15 Nov 24 21:05 UTC] No.42151036{8}[source]▶

>>42148139 #

Sure if you want to take the absolute least charitable interpretation of what I said lol.

replies(1): >>42161327 #

24. sarchertech ◴[17 Nov 24 01:54 UTC] No.42161327{9}[source]▶

>>42151036 #

“write a piece of code with the reward being the ability to send it to the LLM to create some tests and then seeing a nice stream of green checkmarks”

You write code, then you send the code to the LLM to create tests for you.

How can this possibly be interpreted to mean the reverse?

That you write tests first by asking the LLM in English to help you without “sending the code” you wrote because you haven’t written it yet. Then you use those tests to help you write the code.

Now if you misspoke then my comment isn’t relevant to your situation, but don’t pretend that I somehow interpreted what you said uncharitably. There’s no other way to interpret it.

replies(1): >>42165498 #

25. MarcelOlsz ◴[17 Nov 24 17:35 UTC] No.42165498{10}[source]▶

>>42161327 #

Ok you win.

replies(1): >>42165933 #

26. sarchertech ◴[17 Nov 24 18:36 UTC] No.42165933{11}[source]▶

>>42165498 #

Thanks

replies(2): >>42166456 #>>42166489 #

27. ◴[17 Nov 24 19:38 UTC] No.42166456{12}[source]▶

>>42165933 #

28. ◴[17 Nov 24 19:41 UTC] No.42166489{12}[source]▶

>>42165933 #

29. dang ◴[17 Nov 24 22:33 UTC] No.42167974[source]▶

>>42140464 #

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

https://news.ycombinator.com/newsguidelines.html

replies(1): >>42204039 #

30. sarchertech ◴[21 Nov 24 13:18 UTC] No.42204039{3}[source]▶

>>42167974 #

We continued this discussion and the OP actually meant that they write tests first and then use those tests to help them write code.

However what they said was they write code then send that to an LLM to generate tests.

There’s no other way to interpret what they wrote. There’s no way to get “I write tests using plain English and an LLM, then use those tests to help me write code from “I write code, I send that code to an LLM to generate tests, those tests give me a satisfying series of green checks”.

This is a case of the OP writing something that is close to the opposite of what they meant and you can’t reasonably iron man that to get to a stronger position for them.

I get that my wording was too harsh, but I estimated a 50% chance it was actually satire.

↑