←back to thread

296 points todsacerdoti | 1 comments | | HN request time: 0.214s | source
Show context
marcosdumay ◴[] No.44367966[source]
Yeah, make the network deeper.

When all you have is a hammer... It makes a lot of sense that a transformation layer that makes the tokens more semantically relevant will help optimize the entire network after it and increase the effective size of your context window. And one of the main immediate obstacle stopping those models from being intelligent is context window size.

On the other hand, the current models already cost something on the line of the median country GDP to train, and they are nowhere close to that in value. The saying that "if brute force didn't solve your problem, you didn't apply enough force" is intended to be listened as a joke.

replies(4): >>44368591 #>>44368640 #>>44374381 #>>44377110 #
1. hiddencost ◴[] No.44377110[source]
I mean brute force is working great. Acceleration is large, cost per unit intelligence are dropping much faster than Moore's law. We've been doing this kind of scaling since the 80s across various measures and are quite good at it.