(timkellogg.me)

851 points tkellogg | 1 comments | 05 Feb 25 11:05 UTC | HN request time: 0.204s | source

Show context

mtrovo ◴[05 Feb 25 16:48 UTC] No.42951263[source]▶

I found the discussion around inference scaling with the 'Wait' hack so surreal. The fact such an ingeniously simple method can impact performance makes me wonder how many low-hanging fruit we're still missing. So weird to think that improvements on a branch of computer science is boiling down to conjuring the right incantation words, how you even change your mindset to start thinking this way?

replies(16): >>42951704 #>>42951764 #>>42951829 #>>42953577 #>>42954518 #>>42956436 #>>42956535 #>>42956674 #>>42957820 #>>42957909 #>>42958693 #>>42960400 #>>42960464 #>>42961717 #>>42964057 #>>43000399 #

codeulike ◴[06 Feb 25 08:55 UTC] No.42960464[source]▶

>>42951263 #

Wait, so the trick is they reach into the context and basically switch '</think>' with 'wait' and that makes it carry on thinking?

replies(3): >>42961113 #>>42962970 #>>42963406 #

1. danans ◴[06 Feb 25 15:41 UTC] No.42963406[source]▶

>>42960464 #

Not sure if your pun was intended, but 'wait' probably works so well because of the models being trained on text structured like your comment, where "wait" is followed by a deeper understanding.

↑

S1: A $6 R1 competitor?