S1: A $6 R1 competitor?

1. bberenberg ◴[05 Feb 25 12:38 UTC] No.42947651[source]▶

>>42946854 (OP) #

In case you’re not sure what S1 is, here is the original paper: https://arxiv.org/html/2501.19393v1

replies(5): >>42947676 #>>42947678 #>>42947692 #>>42958345 #>>42961055 #

2. mi_lk ◴[05 Feb 25 12:41 UTC] No.42947676[source]▶

>>42947651 (TP) #

it's also the first link in the article's first sentence

replies(1): >>42947722 #

3. ◴[05 Feb 25 12:41 UTC] No.42947678[source]▶

>>42947651 (TP) #

4. addandsubtract ◴[05 Feb 25 12:42 UTC] No.42947692[source]▶

>>42947651 (TP) #

It's linked in the blog post, too. In the first sentence, actually, but for some reason the author never bothered to attach the name to it. As if keeping track of o1, 4o, r1, r2d2, wasn't exhausting enough already.

replies(1): >>42947833 #

5. bberenberg ◴[05 Feb 25 12:44 UTC] No.42947722[source]▶

>>42947676 #

Good call, I must have missed it. I read the whole blog then went searching for what S1 was.

6. kgwgk ◴[05 Feb 25 12:55 UTC] No.42947833[source]▶

>>42947692 #

> for some reason the author never bothered to attach the name to it

Respect for his readers’ intelligence, maybe.

7. rahimnathwani ◴[06 Feb 25 02:38 UTC] No.42958345[source]▶

>>42947651 (TP) #

  To enforce a minimum, we suppress the generation of the end-of-thinking token delimiter and optionally append the string “Wait” to the model’s current reasoning trace to encourage the model to reflect on its current generation.

Does this mean that the end-of-thinking delimiter is a single token? Presumably </think> or similar wasn't a single token for the base model. Did they just pick a pair of uncommon single-token symbols to use as delimiters?

EDIT: Never mind, end of thinking is represented with <|im_start|> followed by the word 'answer', so the code dynamically adds/removes <|im_start|> from the list of stop tokens.

8. dagurp ◴[06 Feb 25 10:42 UTC] No.42961055[source]▶

>>42947651 (TP) #

I don't know what R1 is either

replies(1): >>42961804 #

9. latexr ◴[06 Feb 25 12:41 UTC] No.42961804[source]▶

>>42961055 #

It’s the DeepSeek reasoning model.