Unrolling as a performance optimization is usually slightly different, typically working in batches rather than unrolling the entire thing, even when the length is known at compile time.
The docs suggest not using `inline` for performance without evidence it helps in your specific usage, largely because the bloated binary is likely to be slower unless you have a good reason to believe your case is special, and also because `inline` _removes_ optimization potential from the compiler rather than adding it (its inlining passes are very, very good, and despite having an extremely good grasp on which things should be inlined I rarely outperform the compiler -- I'm never worse, but the ability to not have to even think about it unless/until I get to the microoptimization phase of a project is liberating).