https://iclr.cc/virtual/2025/oral/31888
Our poster was popular:
poster: https://iclr.cc/media/PosterPDFs/ICLR%202025/30358.png?t=174...
oral presentation (watch me roast yoshua bengio on this topic and then have him be the first questioner, 2nd speaker starting around 19:30 min mark. My slides for the presentation are there too and really funny.): https://iclr.cc/virtual/2025/session/31936
paper: https://arxiv.org/abs/2407.01082
As one of the min_p authors, I can confirm that Top N sigma is currently the best general purpose sampler by far. Also, temperature can and should be scaled far higher than it is today. Temps of 100 are totally fine with techniques like min_p and top N sigma.
Also, the special case of top_k = 2 with ultra high temperature (one thing authors recommend against near the end) is very interesting in its own right. Doing it leads to spelling errors every ~10th word - but also seems to have a certain creativity to it that's quite interesting.