In the traditional Python API, the Voyage engine will tokenize blocks of text and output a string of characters. This model seems to be doing that by vectorizing images in space.
Words like 'you' and 'apple' will be a unitary token. More complex terms like 'pikachu' may be divided into pik-a-chu.