←back to thread

S1: A $6 R1 competitor?

(timkellogg.me)
851 points tkellogg | 9 comments | | HN request time: 0.418s | source | bottom
1. bberenberg ◴[] No.42947651[source]
In case you’re not sure what S1 is, here is the original paper: https://arxiv.org/html/2501.19393v1
replies(5): >>42947676 #>>42947678 #>>42947692 #>>42958345 #>>42961055 #
2. mi_lk ◴[] No.42947676[source]
it's also the first link in the article's first sentence
replies(1): >>42947722 #
3. ◴[] No.42947678[source]
4. addandsubtract ◴[] No.42947692[source]
It's linked in the blog post, too. In the first sentence, actually, but for some reason the author never bothered to attach the name to it. As if keeping track of o1, 4o, r1, r2d2, wasn't exhausting enough already.
replies(1): >>42947833 #
5. bberenberg ◴[] No.42947722[source]
Good call, I must have missed it. I read the whole blog then went searching for what S1 was.
6. kgwgk ◴[] No.42947833[source]
> for some reason the author never bothered to attach the name to it

Respect for his readers’ intelligence, maybe.

7. rahimnathwani ◴[] No.42958345[source]

  To enforce a minimum, we suppress the generation of the end-of-thinking token delimiter and optionally append the string “Wait” to the model’s current reasoning trace to encourage the model to reflect on its current generation.
Does this mean that the end-of-thinking delimiter is a single token? Presumably </think> or similar wasn't a single token for the base model. Did they just pick a pair of uncommon single-token symbols to use as delimiters?

EDIT: Never mind, end of thinking is represented with <|im_start|> followed by the word 'answer', so the code dynamically adds/removes <|im_start|> from the list of stop tokens.

8. dagurp ◴[] No.42961055[source]
I don't know what R1 is either
replies(1): >>42961804 #
9. latexr ◴[] No.42961804[source]
It’s the DeepSeek reasoning model.