←back to thread

302 points JnBrymn | 1 comments | | HN request time: 0s | source
Show context
yunwal ◴[] No.45661042[source]
> The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input.

> Maybe it makes more sense that all inputs to LLMs should only ever be images.

So, what, every time I want to ask an LLM a question I paint a picture? I mean at that point why not just say "all input to LLMs should be embeddings"?

replies(4): >>45661392 #>>45675872 #>>45676027 #>>45678135 #
1. awesome_dude ◴[] No.45678135[source]
I mean, text is, after all, highly stylised images

It's trivial for text to be pasted in, and converted to pixels (that's what my, and every computer on the planet, does when showing me text)