(rentry.co)

228 points nkko | 1 comments | 04 May 25 16:26 UTC | HN request time: 0s | source

Show context

mdp2021 ◴[04 May 25 18:03 UTC] No.43888295[source]▶

When the attempt is though to have the LLM output an "idea", not just a "next token", the selection over the logits vector should break that original idea... If the idea is complete, there should be no need to use sampling over the logits.

The sampling, in this framework, should not happen near the output level ("what will the next spoke word be").

replies(1): >>43888329 #

minimaxir ◴[04 May 25 18:09 UTC] No.43888329[source]▶

>>43888295 #

LLMs are trained to maximize the probability of correct guesses for the next token, not "ideas". You cannot define an idea as a training loss objective.

replies(2): >>43888414 #>>43888493 #

1. orbital-decay ◴[04 May 25 18:34 UTC] No.43888493[source]▶

>>43888329 #

Interpretability studies offer several orthogonal ways to look at this, it's like Newtonian vs Lagrangian mechanics. Autoregressive token prediction, pattern matching, idea conceptualization, pathfinding in the extremely multidimensional space...

↑

Dummy's Guide to Modern LLM Sampling