There is a small list of reasons why it needs to be "debunked:"
1. Your phone is gathering data that you don't realize that it gathers.
One of the biggest examples of this is real-time location data that is brokered by cellular carriers and sold as aggregated marketing data. You don't have to give your apps permission to do anything like that because your cellular carrier can get that data regardless of your phone's OS.
2. Your phone is gathering data that you gave it permission to gather, perhaps gathering it in a way you didn't think it would do.
For example, let's say you give an app permission to read your entire photo library so that you can upload a photo. But since you gave it that permission on the OS level, it might be uploading more images than you explicitly select. Another example used to be clipboard data before the OSes asked permission for use of the clipboard. One last example is text that you enter but do not submit.
Another big aspect of this is that people don't realize how these ad networks work in real time. It's not a slow thing for an advertising company to learn something about you and react accordingly, it can happen in a few short seconds.
2. The average person doesn't have any comprehension of how easy it is for data science practices to uncover information about you based on metadata that seems benign or that you don't know exists.
Most people don't understand how your behavior in an app can be used to tell the company things you like and dislike. The TikTok algorithm is a great example, it can tell what you like just by extremely subtle inputs, how you swipe, how long you watch the video. A lot of people don't realize how many things about them aren't particularly unique and how many preferences can be tied to a really specific persona that you fall into.
A real world example of all of this put together is that I was spending a lot of time browsing appliances because I just bought one, and I went to physically visit a friend. We were talking about my new appliance, and later they got ads for that specific appliance. So, the person's reaction would naturally be "it was listening to us!!" but in reality, it is more likely that our cellular carrier or carriers knew we were physically in the same place and reported that piece of information to some kind of data broker. Consider how there are a limited amount of cellular carriers, that location data may not have needed to even exit the cellular carrier to sell this data to someone. I.e., if we both have the same cellular carrier , our company already has that information and it isn't selling it to another company, it's perhaps just telling a data broker that Person A and Person B interact with each other.
Just note that I'm not claiming this is exactly how it all works as I'm not in that industry, but the general ideas here apply. The general takeaway is that literally recording audio with a microphone just isn't necessary to derive hyper-specific things about people.