(medium.com)

83 points peakji | 2 comments | 22 Oct 24 16:07 UTC | HN request time: 0.406s | source

Steiner is a series of reasoning models trained on synthetic data using reinforcement learning. These models can explore multiple reasoning paths in an autoregressive manner during inference and autonomously verify or backtrack when necessary, enabling a linear traversal of the implicit search tree.

Blog: https://medium.com/@peakji/a-small-step-towards-reproducing-...

Hugging Face: https://huggingface.co/collections/peakji/steiner-preview-67...

1. nwnwhwje ◴[22 Oct 24 17:41 UTC] No.41916708[source]▶

>>41915735 (OP) #

Silly question time.

Is this a fined tuned LLM, for example drop in replacement for Llama etc.

Or is it some algorithm on top of an LLM, doing some chain of reasoning?

replies(1): >>41916770 #

2. peakji ◴[22 Oct 24 17:47 UTC] No.41916770[source]▶

>>41916708 (TP) #

It is an LLM fine-tuned using a new type of dataset and RL reward. It's good at reasoning, but I would not recommend to replace Llama for general tasks.

↑

Show HN: Steiner – An open-source reasoning model inspired by OpenAI o1