←back to thread

233 points JnBrymn | 1 comments | | HN request time: 0.2s | source
Show context
yunwal ◴[] No.45661042[source]
> The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input.

> Maybe it makes more sense that all inputs to LLMs should only ever be images.

So, what, every time I want to ask an LLM a question I paint a picture? I mean at that point why not just say "all input to LLMs should be embeddings"?

replies(4): >>45661392 #>>45675872 #>>45676027 #>>45678135 #
1. fspeech ◴[] No.45675872[source]
If you can read your input on your screen your computer apparently knows how to convert your texts to images.