Tokenizing text is such a hack even though it works pretty well. The state-of-the-art comes out of the gate with an approximation for quantifying language that's wrong on so many levels.
It's difficult to wrap my head around pixels being a more powerful representation of information, but someone's gotta come up with something other than tokenizer.