(bbycroft.net)

638 points gmays | 1 comments | 04 Sep 25 18:06 UTC | HN request time: 0.204s | source

Show context

southp ◴[05 Sep 25 08:20 UTC] No.45136237[source]▶

It's fascinating, even though my knowledge to LLM is so limited that I don't really understand what's happening. I'm curious how the examples are plotted and how much resemblance they are to the real models, though. If one day we could reliably plot a LLM into modules like this using an algorithm, does that mean we would be able to turn LLMs into chips, rather than data centers?

replies(5): >>45136340 #>>45136985 #>>45136988 #>>45137239 #>>45166151 #

1. visarga ◴[05 Sep 25 10:23 UTC] No.45136988[source]▶

>>45136237 #

The resemblance is pretty good, they can't show all details because the diagram would be hard to see. But the essential parts are there.

I find the model to be extremely simple, you can write the attention equation on a napkin.

This is the core idea:

Attention(Q, K, V) = softmax(Q * K^T / sqrt(d_k)) * V

The attention process itself is based on all-to-all similarity calculation Q * K

↑

LLM Visualization