(medium.com)

83 points peakji | 2 comments | 22 Oct 24 16:07 UTC | HN request time: 0.431s | source

Steiner is a series of reasoning models trained on synthetic data using reinforcement learning. These models can explore multiple reasoning paths in an autoregressive manner during inference and autonomously verify or backtrack when necessary, enabling a linear traversal of the implicit search tree.

Blog: https://medium.com/@peakji/a-small-step-towards-reproducing-...

Hugging Face: https://huggingface.co/collections/peakji/steiner-preview-67...

1. zby ◴[22 Oct 24 16:53 UTC] No.41916224[source]▶

>>41915735 (OP) #

Can it be mixed with the sampling based approaches from optillm (https://github.com/codelion/optillm)?

replies(1): >>41916366 #

2. peakji ◴[22 Oct 24 17:08 UTC] No.41916366[source]▶

>>41916224 (TP) #

Approaches like best of n sampling and majority voting are definitely feasible. But I don't recommend trying things related to CoT, as it might interfere with the internalized reasoning patterns.

↑

Show HN: Steiner – An open-source reasoning model inspired by OpenAI o1