←back to thread

S1: A $6 R1 competitor?

(timkellogg.me)
851 points tkellogg | 2 comments | | HN request time: 2.005s | source
Show context
pona-a ◴[] No.42948636[source]
If chain of thought acts as a scratch buffer by providing the model more temporary "layers" to process the text, I wonder if making this buffer a separate context with its own separate FNN and attention would make sense; in essence, there's a macroprocess of "reasoning" that takes unbounded time to complete, and then there's a microprocess of describing this incomprehensible stream of embedding vectors in natural language, in a way returning to the encoder/decoder architecture but where both are autoregressive. Maybe this would give us a denser representation of said "thought", not constrained by imitating human text.
replies(7): >>42949506 #>>42949822 #>>42950000 #>>42950215 #>>42952388 #>>42955350 #>>42957969 #
1. easeout ◴[] No.42950000[source]
Here's a paper your idea reminds me of. https://arxiv.org/abs/2501.19201

It's also so not far from Meta's large concept model idea.

replies(1): >>42950129 #
2. pona-a ◴[] No.42950129[source]
Previous discussion:

[41 comments, 166 points] https://news.ycombinator.com/item?id=42919597