←back to thread

263 points itzlambda | 1 comments | | HN request time: 0.3s | source
Show context
lsy ◴[] No.44608975[source]
If you have a decent understanding of how LLMs work (you put in basically every piece of text you can find, get a statistical machine that models text really well, then use contractors to train it to model text in conversational form), then you probably don't need to consume a big diet of ongoing output from PR people, bloggers, thought leaders, and internet rationalists. That seems likely to get you going down some millenarian path that's not helpful.

Despite the feeling that it's a fast-moving field, most of the differences in actual models over the last years are in degree and not kind, and the majority of ongoing work is in tooling and integrations, which you can probably keep up with as it seems useful for your work. Remembering that it's a model of text and is ungrounded goes a long way to discerning what kinds of work it's useful for (where verification of output is either straightforward or unnecessary), and what kinds of work it's not useful for.

replies(12): >>44609211 #>>44609259 #>>44609322 #>>44609630 #>>44609864 #>>44609882 #>>44610429 #>>44611712 #>>44611764 #>>44612491 #>>44613946 #>>44614339 #
1. alphazard ◴[] No.44609630[source]
When explaining LLMs to people, often the high level architecture is what they find the most interesting. Not the transformer, but the token by token prediction strategy (autoregression), and not always choosing the most likely token, but a token proportional to its likelihood.

The minutiae of how next token prediction works is rarely appreciated by lay people. They don't care about dot products, or embeddings, or any of it. There's basically no advantage to explaining how that part works since most people won't understand, retain, or appreciate it.