About optimization, I feel like NumPy is meant to be a de facto standard and reference implementation. It covers all use cases with decent efficiency, not the fastest way possible. There are more limited drop-in replacements that use more CPU parallelism or GPU if NumPy isn't fast enough for your use case. Just wish it were clearer which NumPy build I'm installing, cause apparently `pip3 install numpy` on my Mac gave me something built with the worst flags possible.
About >2 dimensions, I always found this confusing in NumPy but just chalked it up to >2 dim arrays being inherently confusing. Maybe there really is a better way.