(blog.voyageai.com)

263 points fzliu | 1 comments | 17 Nov 24 07:42 UTC | HN request time: 0.211s | source

1. unit149 ◴[17 Nov 24 10:57 UTC] No.42163393[source]▶

In the traditional Python API, the Voyage engine will tokenize blocks of text and output a string of characters. This model seems to be doing that by vectorizing images in space.

Words like 'you' and 'apple' will be a unitary token. More complex terms like 'pikachu' may be divided into pik-a-chu.

[1]: https://docs.voyageai.com/docs/tokenization

↑

All-in-one embedding model for interleaved text, images, and screenshots