←back to thread

S1: A $6 R1 competitor?

(timkellogg.me)
851 points tkellogg | 1 comments | | HN request time: 0.202s | source
Show context
pona-a ◴[] No.42948636[source]
If chain of thought acts as a scratch buffer by providing the model more temporary "layers" to process the text, I wonder if making this buffer a separate context with its own separate FNN and attention would make sense; in essence, there's a macroprocess of "reasoning" that takes unbounded time to complete, and then there's a microprocess of describing this incomprehensible stream of embedding vectors in natural language, in a way returning to the encoder/decoder architecture but where both are autoregressive. Maybe this would give us a denser representation of said "thought", not constrained by imitating human text.
replies(7): >>42949506 #>>42949822 #>>42950000 #>>42950215 #>>42952388 #>>42955350 #>>42957969 #
1. jjk7 ◴[] No.42952388[source]
Comments on a google doc? Nesting in social media comments?

Seems like similar concepts. I think there is some potential to improving how LLMs improve and further their own reasoning lines, but I'm no AI mage.