(timkellogg.me)

851 points tkellogg | 4 comments | 05 Feb 25 11:05 UTC | HN request time: 0.207s | source

Show context

mtrovo ◴[05 Feb 25 16:48 UTC] No.42951263[source]▶

I found the discussion around inference scaling with the 'Wait' hack so surreal. The fact such an ingeniously simple method can impact performance makes me wonder how many low-hanging fruit we're still missing. So weird to think that improvements on a branch of computer science is boiling down to conjuring the right incantation words, how you even change your mindset to start thinking this way?

replies(16): >>42951704 #>>42951764 #>>42951829 #>>42953577 #>>42954518 #>>42956436 #>>42956535 #>>42956674 #>>42957820 #>>42957909 #>>42958693 #>>42960400 #>>42960464 #>>42961717 #>>42964057 #>>43000399 #

ascorbic ◴[05 Feb 25 20:18 UTC] No.42954518[source]▶

>>42951263 #

I've noticed that R1 says "Wait," a lot in its reasoning. I wonder if there's something inherently special in that token.

replies(2): >>42954757 #>>42959520 #

1. katzenversteher ◴[06 Feb 25 06:00 UTC] No.42959520[source]▶

>>42954518 #

I bet a token like "sht!", "f*" or "damn!" would have the same or even stronger effect but the LLM creators would not like to have the users read them

replies(3): >>42959617 #>>42960035 #>>42960519 #

2. lodovic ◴[06 Feb 25 06:22 UTC] No.42959617[source]▶

>>42959520 (TP) #

I think you're onto something, however, as the training is done through on text and not actual thoughts, it may take some experimentation to find these stronger words.

3. ascorbic ◴[06 Feb 25 07:37 UTC] No.42960035[source]▶

>>42959520 (TP) #

Maybe, but it doesn't just use it to signify that it's made a mistake. It also uses it in a positive way, such as it's had a lightbulb moment. Of course some people use expletives in the same way, but that would be less common than for mistakes.

4. raducu ◴[06 Feb 25 09:05 UTC] No.42960519[source]▶

>>42959520 (TP) #

It's literally in the article, they measured it and wait was the best token

↑

S1: A $6 R1 competitor?