←back to thread

228 points nkko | 4 comments | | HN request time: 0.001s | source
1. Der_Einzige ◴[] No.43888091[source]
Related to this, our min_p paper was ranked #18 out of 12000 submission at ICLR and got an oral:

https://iclr.cc/virtual/2025/oral/31888

Our poster was popular:

poster: https://iclr.cc/media/PosterPDFs/ICLR%202025/30358.png?t=174...

oral presentation (watch me roast yoshua bengio on this topic and then have him be the first questioner, 2nd speaker starting around 19:30 min mark. My slides for the presentation are there too and really funny.): https://iclr.cc/virtual/2025/session/31936

paper: https://arxiv.org/abs/2407.01082

As one of the min_p authors, I can confirm that Top N sigma is currently the best general purpose sampler by far. Also, temperature can and should be scaled far higher than it is today. Temps of 100 are totally fine with techniques like min_p and top N sigma.

Also, the special case of top_k = 2 with ultra high temperature (one thing authors recommend against near the end) is very interesting in its own right. Doing it leads to spelling errors every ~10th word - but also seems to have a certain creativity to it that's quite interesting.

replies(1): >>43888420 #
2. toxik ◴[] No.43888420[source]
Are there any samplers that aren't basically greedy? I.e. actually searches the tree. I realize it's an absolutely insane branching factor and quite expensive to expand nodes at that, but it always seemed odd to me that we don't actually search.
replies(2): >>43888793 #>>43890210 #
3. Kubuxu ◴[] No.43888793[source]
Beam Search sampling is sometimes getting used
4. Der_Einzige ◴[] No.43890210[source]
Besides beam search and it's variants? (there are many including the little known but awesomely powerful constrained beam search: https://huggingface.co/blog/constrained-beam-search)

Does MBR (minimal bayes risk) sampling count?

Also there was this paper at ICLR which is relevant to this question: https://arxiv.org/abs/2410.03968

This paper basically claims that non-heuristic methods (like beam search) are harmful compared to the heuristic ones.