Last time I tested branchless UTF-8 algorithms, I came to the conclusion that they only perform [slightly] better for text consisting of foreign multibyte characters. Unless you expect lots of such inputs on the hot path, just go with traditional algorithms instead. Even in the worst case the difference isn't that big.
Sometimes people fail to appreciate how insanely fast a predictable branch really is.