(timkellogg.me)

851 points tkellogg | 1 comments | 05 Feb 25 11:05 UTC | HN request time: 0s | source

Show context

jebarker ◴[05 Feb 25 14:31 UTC] No.42948939[source]▶

S1 (and R1 tbh) has a bad smell to me or at least points towards an inefficiency. It's incredible that a tiny number of samples and some inserted <wait> tokens can have such a huge effect on model behavior. I bet that we'll see a way to have the network learn and "emerge" these capabilities during pre-training. We probably just need to look beyond the GPT objective.

replies(2): >>42949122 #>>42953281 #

pas ◴[05 Feb 25 14:43 UTC] No.42949122[source]▶

>>42948939 #

can you please elaborate on the wait tokens? what's that? how do they work? is that also from the R1 paper?

replies(2): >>42949165 #>>42956305 #

1. throwaway314155 ◴[05 Feb 25 22:33 UTC] No.42956305[source]▶

>>42949122 #

There's a decent explanation in the article, just FYI.

↑

S1: A $6 R1 competitor?