I vaguely remember some benchmark I read a while back for some other io_uring project, and it suggested that io_uring syscalls are more expensive than whatever the other syscalls were that it was being used to replace. It's still a big improvement, even if not as big as you'd hope.
I wish I could remember the post, but I've had that impression in the back of my mind ever since.
The only VDSO-capable calls are clock_gettime, getcpu, getrandom, gettimeofday, and time. (Other architectures have some more, mostly related to signals and CPU cache flushing.)