←back to thread

68 points peakji | 2 comments | | HN request time: 0s | source

Steiner is a series of reasoning models trained on synthetic data using reinforcement learning. These models can explore multiple reasoning paths in an autoregressive manner during inference and autonomously verify or backtrack when necessary, enabling a linear traversal of the implicit search tree.

Blog: https://medium.com/@peakji/a-small-step-towards-reproducing-...

Hugging Face: https://huggingface.co/collections/peakji/steiner-preview-67...

1. nwnwhwje ◴[] No.41916708[source]
Silly question time.

Is this a fined tuned LLM, for example drop in replacement for Llama etc.

Or is it some algorithm on top of an LLM, doing some chain of reasoning?

replies(1): >>41916770 #
2. peakji ◴[] No.41916770[source]
It is an LLM fine-tuned using a new type of dataset and RL reward. It's good at reasoning, but I would not recommend to replace Llama for general tasks.