←back to thread

LLM Visualization

(bbycroft.net)
638 points gmays | 1 comments | | HN request time: 0.204s | source
Show context
southp ◴[] No.45136237[source]
It's fascinating, even though my knowledge to LLM is so limited that I don't really understand what's happening. I'm curious how the examples are plotted and how much resemblance they are to the real models, though. If one day we could reliably plot a LLM into modules like this using an algorithm, does that mean we would be able to turn LLMs into chips, rather than data centers?
replies(5): >>45136340 #>>45136985 #>>45136988 #>>45137239 #>>45166151 #
1. visarga ◴[] No.45136988[source]
The resemblance is pretty good, they can't show all details because the diagram would be hard to see. But the essential parts are there.

I find the model to be extremely simple, you can write the attention equation on a napkin.

This is the core idea:

Attention(Q, K, V) = softmax(Q * K^T / sqrt(d_k)) * V

The attention process itself is based on all-to-all similarity calculation Q * K