Fast algorithms are often more complicated.
To give just one simple example: to get the textbook complexity bound for the Dijkstra algorithm, you need some fancy mergeable heap data structures which are much more complicated, and thus time-intense to implement than the naive implementation.
Or you can get insane low-level optimizations by using the SIMD instructions that modern processors provide. Unluckily, this takes a lot of time and leads to code that is not easy to understand (and thus not easy to write) for "classically trained" programmers.
Yes, you indeed need a lot of skills to write such very fast algorithms, but even for such ultra-smart programmers, finding and applying such optimizations need a lot of development time, which is why this is often only done for code parts that are insanely computation-intense and performance-critical such as video (and sometimes audio) codecs.