←back to thread

183 points vortex_ape | 1 comments | | HN request time: 0.511s | source
Show context
ThatGuyRaion ◴[] No.42742391[source]
So is this potentially performance improving?.
replies(2): >>42742494 #>>42742719 #
PhilipRoman ◴[] No.42742494[source]
Last time I tested branchless UTF-8 algorithms, I came to the conclusion that they only perform [slightly] better for text consisting of foreign multibyte characters. Unless you expect lots of such inputs on the hot path, just go with traditional algorithms instead. Even in the worst case the difference isn't that big.

Sometimes people fail to appreciate how insanely fast a predictable branch really is.

replies(2): >>42744257 #>>42747758 #
1. Laiho ◴[] No.42747758[source]

  fn validate_ascii(bytes: &[u8]) -> bool{
      bytes.iter().fold(true, |acc, b| acc & (\*b <= 127))
  }
This check will likely be the best for english text/code. You can check in varying size chunks depending on how common you think non-ascii will be. If its ascii you can move 128 bytes forward on avx2 in a couple of cycles.