Most active commenters
  • jf22(6)
  • TemptedMuse(4)
  • dlisboa(4)

←back to thread

413 points martinald | 31 comments | | HN request time: 0.646s | source | bottom
1. Volundr ◴[] No.46205257[source]
FTA

> I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests) for a fairly complex internal tool. This would take me, or many developers I know and respect, days to write by hand.

I have no problem believing that Claude generated 300 passing tests. I have a very hard time believing those tests were all well thought out, consise, actually testing the desired behavior while communicating to the next person or agent how the system under test is supposed to work. I'd give very good odds at least some of those tests are subtly testing themselves (ex mocking a function, calling said function, then asserting the mock was called). Many of them are probably also testing implementation details that were never intended to be part of the contract.

I'm not anti-AI, I use it regularly, but all of these articles about how crazy productive it is skip over the crazy amount of supervision it needs. Yes, it can spit out code fast, but unless your prepared to spend a significant chunk of that 'saved" time CAREFULLY (more carefully than with a human) reviewing code, you've accepted a big drop in quality.

replies(7): >>46205349 #>>46205526 #>>46205624 #>>46206683 #>>46206705 #>>46208955 #>>46214506 #
2. kllrnohj ◴[] No.46205349[source]
Anecdotes etc etc but the AI tests I've been sent to review have been absolute shit. Stuff like that just calling a function doesn't crash the program. No assertions other than "end of test method reached"

Yes sometimes those tests are necessary, but it seemed to just do it everywhere because it made the code coverage percentage go up. Even though it was useless.

I have also had great experiences with AI cranking out straightforward boilerplate or asking C++ template metaprogramming questions. It's not all negative. But net-net it feels like it takes more work in total to use AI as you have to learn to recognize when it just won't handle the task, which can happen a lot. And you need to keep up with what it did enough to be able to take over. And reading code is harder than writing it.

replies(1): >>46206422 #
3. rkozik1989 ◴[] No.46205526[source]
The benefit of having a team of QA engineers create tests is their differing perspectives, so with LLMs being trained to act like affirmation engines you have to wonder how that impacts the test cases it creates. Its the problem of LLMs being miserable at critiques manifesting itself in a different way.

However, in saying that, I am by no means an AI hater, but rather I just want models to be better than they currently are. I am tired of the tech demos and benchmark stats that don't really mean much aside from impressing someone who's not in a critical thinking mindset.

4. jf22 ◴[] No.46205624[source]
> you've accepted a big drop in quality.

Right, but you do it in a 10th of the time.

replies(2): >>46205955 #>>46206771 #
5. WesleyJohnson ◴[] No.46205955[source]
So you're openly saying you're fine with quantity over quality.... in software engineering? That's fine for a MVP, maybe, but nothing beyond on that IMHO unless they're throw away scripts.

"Houston, we have a problem."

"Yeah, but we did it in a 10th of the time"

replies(3): >>46206575 #>>46207157 #>>46210494 #
6. piperswe ◴[] No.46206422[source]
I’ve seen agents produce plenty of those tests, but recently I’ve seen them generate some actually decent unit tests that I wouldn’t have thought of myself. It’s a bit of a crapshoot
7. TemptedMuse ◴[] No.46206575{3}[source]
Here is the thing, most software engineers are not designing rockets, they are making basic CRUD apps. If there is a minor defect it can be caught and corrected without much issue. Our jobs are a lot less "critical infrastructure" than a lot of software engineers will allow their egos to accept.

Sure if you are making some medical surgery robot do it right, but if you are making a website the recommends wine pairings who cares if one of the buttons has a weird animation bug that doesn't even get noticed for a couple of years.

replies(1): >>46206734 #
8. klysm ◴[] No.46206683[source]
Very similar experience here. I have not once managed to get an LLM to generate good tests, even for very simple code. It generally writes tautologies that will pass with high confidence.
9. stocksinsmocks ◴[] No.46206705[source]
Which is the better value:

Hundreds of tests that were written basically for free in a few minutes even though a lot of them are kind of dumb?

Or hundreds of tests that were written for a five figure sum that took weeks or months, and only some of them are kind of dumb?

If you’re just thinking of code as the end in and of itself, then of course, the handcrafted artisanal product is better. If you think of code like an owner, an incidental expense towards solving a problem that has value, then cheap and disposable wins every time. We can throw our hands up about “quality“ and all that, but that baby was thrown out with the bathwater a very, very long time ago. The modern Web is slower than the older web. Desktop applications are just web browsers. Enterprise software barely works. Windows 11 happened. I don’t think anybody even bothers to scrutinize their dependency chains except for, I don’t know, like maybe missile guidance or something. And I just want to say Claude is not responsible for any of this. You humans are.

replies(1): >>46209841 #
10. dlisboa ◴[] No.46206734{4}[source]
I think I'm "most" engineers and I haven't ever worked on something that was "just" a CRUD app. Having a DB behind your web app doesn't make it "just" a CRUD.

It's really overestimated how many simple apps exist.

replies(1): >>46207165 #
11. bagacrap ◴[] No.46206771[source]
I mean, just say you view unit testing as nothing more than a checkbox.
replies(1): >>46207181 #
12. jf22 ◴[] No.46207157{3}[source]
> quantity over quality

Yes? Making quality concessions for more code or features is part of the job.

13. jf22 ◴[] No.46207165{5}[source]
What kind of apps do you work on?
replies(1): >>46207245 #
14. jf22 ◴[] No.46207181{3}[source]
I don't know why you'd think that.

200 decent unit tests are better than zero unit tests.

replies(1): >>46209757 #
15. dlisboa ◴[] No.46207245{6}[source]
Regular SaaS products of different kinds, cloud software, hosting software, etc. Really representative of most of the Web-enabled software out there.

For every one of them there has been an almost negligible amount of CRUD code, the meat of every one of those apps was very specific business logic. Some were also heavy on the frontend with equal amount of complexity on the backend. As a senior/staff level engineer you also have dive into other things like platform enablement, internal tooling, background jobs and data wrangling, distributed architectures, etc. which are even farther from CRUD.

replies(2): >>46207551 #>>46207973 #
16. jf22 ◴[] No.46207551{7}[source]
A fancy CRUD app is still a CRUD app.
replies(1): >>46207853 #
17. dlisboa ◴[] No.46207853{8}[source]
Yes, like a guided missile is a fancy firecracker.
replies(1): >>46208079 #
18. TemptedMuse ◴[] No.46207973{7}[source]
That is just CRUD with buzzword soup around it.
19. TemptedMuse ◴[] No.46208079{9}[source]
Not to call you out but this is exactly what I meant when I said software engineers have egos that will not let them accept that they are not designing critical stuff.

Comparing your cloud based CRUD app to a missile is a perfect illustration. There is no dishonor in admitting that our stuff isn't going to kill anyone if there is a bug. Don't write bad code, but also sometimes just getting something out the door is much better than perfect quality (bird in the hand and all that).

replies(2): >>46208196 #>>46208886 #
20. dlisboa ◴[] No.46208196{10}[source]
Not to call you out either but it seems you have really no idea what a basic CRUD app is. Which is fine, I guess not everyone likes to reads the base definitions of these things. It's clear I replied to the wrong person as we don't have a shared understanding of complexity.
21. instig007 ◴[] No.46208886{10}[source]
> software engineers have egos that will not let them accept that they are not designing critical stuff

> Don't write bad code, but also sometimes just getting something out the door is much better than perfect quality (bird in the hand and all that).

Your bank account can be represented as a CR app, it's two letters short of CRUD, but it doesn't make it simple or simpler in any sense of the words.

Now the question: how much are you tolerant to bugs in your bank account? How often can they happen before you complain?

replies(1): >>46209072 #
22. a_rana ◴[] No.46208955[source]
Hard disagree! Review is still an important part but more and more, I find myself agreeing with the LLM generated changes ~90% of the time.
23. TemptedMuse ◴[] No.46209072{11}[source]
Banking software is critical, but guess what, most software engineers are not writing banking software. I never said no software engineers write critical code. Heck I'd argue most at some point in their career will write something that needs to be as bug free as possible... at some point in their careers.

My point is that for most software engineering getting a product out is more important that a super high quality bar that slows everything down.

If you are writing banking software or flight control systems please do it with care, if you are making some React based recipe website or something I don't really care (99% of software engineering falls into this latter category in my opinion).

Software engineers need to get over themselves a bit, AI really exposed how many were just getting by making repetitive junk and thinking they were special.

replies(1): >>46209719 #
24. instig007 ◴[] No.46209719{12}[source]
> most software engineers are not writing banking software

Many software engineers write software for people who won't like the idea that their request/case can be ignored/failed/lost, when expressed openly on the front page of your business offering. Are bookings important enough? Are gifts for significant events important? Maybe you're okay with losing my code commits every once in a while, I don't know. And I'm not sure why you think it's okay to spread this bad management idea of "not valuable or critical enough" among engineers who should know better and who should keep sources of bad ideas at bay when it comes to software quality in general.

25. welshwelsh ◴[] No.46209757{4}[source]
The main benefit of writing tests is that is forces the developer to think about what they just wrote and what it is supposed to do. I often will find bugs while writing tests.

I've worked on projects with 2,000+ unit tests that are essentially useless, often fail when nothing is wrong, and rarely detect actual bugs. It is absolutely worse than having 0 tests. This is common when developers write tests to satisfy code coverage metrics, instead of in an effort to make sure their code works properly.

replies(1): >>46210712 #
26. welshwelsh ◴[] No.46209841[source]
Neither. Tests should be written by developers only when it saves them time. The cost of writing them should be negative.

Instead of writing hundreds of useless tests so that the code coverage report shows high numbers, it is better to write a couple dozen tests based on business needs and code complexity.

replies(1): >>46211540 #
27. bluesnowmonkey ◴[] No.46210494{3}[source]
Of course it's fine for any project.

There is exactly one "best" programmer in the world, and at this moment he/she is working on at most one project. Every other project in the world is accepting less than the "best" possible quality. Yes... in software engineering.

As soon as you sat down at the keyboard this morning, your employer accepted a sacrifice in quality for the sake of quantity. So did mine. Because neither one of us is the best. They could have hired someone better but they hired you and they're fine with that. They'd rather have the code you produce today than not have it.

It's the same for an AI. It could produce some code for you, right now, for nearly free. Would you rather have that code or not have it? It depends on the situation, yeah not always but sometimes it's worth having.

replies(1): >>46220938 #
28. jf22 ◴[] No.46210712{5}[source]
Look, you tell the LLMs what kind of tests you want and judge the quality before committing.

If you're letting the LLM create useless test that's on you.

I think you're reading these comments in bad faith as if I'm letting the LLM add slop to satisfy a metric.

No, I'm using an LLM to write good tests that I will personally approve as usefull, and other people will review too, before merging into master.

29. stocksinsmocks ◴[] No.46211540{3}[source]
Having used Bentley software products I can tell you with complete certainty that professional software developers have extremely bad judgment when it comes to the need to test software and verify its functionality. Developers just think they know what they’re doing because there’s typically not a strong feedback mechanism that inflicts serious career damage when they do things that are extremely lazy or stupid or unethical. How many people lost their job or had to change their name and live out the rest of their days in Juarez Mexico over AWS’ incomprehensible configuration causing an internet brown out? Anyone? A teenager serves cold onion rings at a burger joint and he’s on the street. Some lazy dweeb at Amazon blows up the internet and - come on, isn’t it about the friends we made along the way? It’s obscene and the lack of professionalism and accountability is a total disgrace.
30. insane_dreamer ◴[] No.46214506[source]
Experience from 2 days ago:

I had CC write a bunch of tests to make sure some refactoring didn't break anything, and then I ran the app and it crashed out of the gate. Why? Because despite the verbosity of the tests it turns out that it had mocked the most import parts to test, so the _actual_ connections weren't being tested, and while CC was happy to claim victory with all tests green, the app was broken.

31. WesleyJohnson ◴[] No.46220938{4}[source]
I didn't intend to imply "best" even in the scope of a team, let alone every software engineer in the world. But, I understand your point and it's fair.