They process the audio but they stumble enough with recall that you cannot really trust it.
I had a problem where I used GPT-4o to help me with inventory management, something a 5th grade kid could handle, and it kept screwing up values for a list of ~50 components. I ended up spending more time trying to get it to properly parse the input audio (I read off the counts as I moved through inventory bins) then if I had just done it manually.
On the other hand, I have had good success with having it write simple programs and apps. So YMMV quite a lot more than with a regular person.