Can you give some examples? What LLM? What code? What tests?
As a test I just asked "ChatGPT 4o with canvas" to "Can you write a set of tests to test glBufferData and all of its edge cases?"
glBufferData is a 32 year old API so there's clearly plenty of examples for to have looked it. There are even multiple public tests for it including the official tests that are open sources and so easily scannable. It failed
It wrote 8 tests, 7 of those tests were wrong in that it did something wrong intentionally then asserted it go no error. It wasn't close to comprehensive. It didn't test the function actually put data in the buffer for example, nor did it check the set of valid enums to see that they work. Nor did it check that the target parameter actually works and affects the correct buffer bound to that target.
This is my experience with LLMs for code so far. I do get answers quicker from LLMs sometimes for tech questions vs searching via Google and reading stack overflow. But that's only sometimes. As a recent example, I was trying to add TypeScript types some JavaScript and it failed. I went round and round tell it it failed but it got stuck in a loop and just kept saying "Oh, sorry. How about this -- repeat of previous code"