←back to thread

S1: A $6 R1 competitor?

(timkellogg.me)
851 points tkellogg | 1 comments | | HN request time: 0.001s | source
Show context
jebarker ◴[] No.42948939[source]
S1 (and R1 tbh) has a bad smell to me or at least points towards an inefficiency. It's incredible that a tiny number of samples and some inserted <wait> tokens can have such a huge effect on model behavior. I bet that we'll see a way to have the network learn and "emerge" these capabilities during pre-training. We probably just need to look beyond the GPT objective.
replies(2): >>42949122 #>>42953281 #
pas ◴[] No.42949122[source]
can you please elaborate on the wait tokens? what's that? how do they work? is that also from the R1 paper?
replies(2): >>42949165 #>>42956305 #
1. throwaway314155 ◴[] No.42956305[source]
There's a decent explanation in the article, just FYI.