Again, not the OP, so I can't speak to exactly their use-case, but the vast majority of call center calls fall into really clear buckets.
To give you an idea: Phonetic transcription was the "state of the art" when I was a QA analyst. It broke call transcripts apart into a stream of phonemes and when you did a search, it would similarly convert your search into a string of phonemes, then look for a match. As you can imagine, this is pretty error prone and you have to get a little clever with it, but realistically, it was more than good enough for the scale we operated at.
If it were an ecom site you'd already know the categories of calls you're interested in because you've been doing that tracking manually for years. Maybe something like "late delivery", "broken item", "unexpected out of stock", "missing pieces", etc.
Basically, you'd have a lot of known context to anchor the llms analysis, which would (probably) cover the vast majority of your calls, leaving you freed up to interact with outliers more directly.
At work as a software dev, having an LLM summarize a meeting incorrectly can be really really bad, so I appreciate the point you're making, but at a call center for an f500 company you're looking for trends and you're aware of your false positive/negative rates. Realistically, those can be relatively high and still provide a lot of value.
Also, if it's a really large company, they almost certainly had someone validate the calls, second-by-second, against the summaries (I know because that was my job for a period of time). That's a minimum bar for _any_ call analysis software so you can justify the spend. Sure, it's possible that was hand-waved, but as the person responsible for the outcome of the new summarization technique with LLMs, you'd be really screwing yourself to handwave a product that made you measurably less effective. There are better ways to integrate the AI hype train into a QA department than replacing the foundation of your analysis, if that's all you're trying to do.