←back to thread

228 points nkko | 1 comments | | HN request time: 0s | source
Show context
mdp2021 ◴[] No.43888295[source]
When the attempt is though to have the LLM output an "idea", not just a "next token", the selection over the logits vector should break that original idea... If the idea is complete, there should be no need to use sampling over the logits.

The sampling, in this framework, should not happen near the output level ("what will the next spoke word be").

replies(1): >>43888329 #
minimaxir ◴[] No.43888329[source]
LLMs are trained to maximize the probability of correct guesses for the next token, not "ideas". You cannot define an idea as a training loss objective.
replies(2): >>43888414 #>>43888493 #
1. orbital-decay ◴[] No.43888493[source]
Interpretability studies offer several orthogonal ways to look at this, it's like Newtonian vs Lagrangian mechanics. Autoregressive token prediction, pattern matching, idea conceptualization, pathfinding in the extremely multidimensional space...