←back to thread

152 points fzliu | 1 comments | | HN request time: 0.221s | source
Show context
bigdict ◴[] No.43562732[source]
Sure, you can get better model performance by throwing more compute at the problem in different places. Does is it improve perf on an isoflop basis?
replies(4): >>43562773 #>>43563245 #>>43563544 #>>43564050 #
1. eightysixfour ◴[] No.43563245[source]
That's... not always a given for SOTA sized models. When the ROI on more training stops, it is nice to have alternatives, whether that is RL-tuned reasoning models or alternative architectures that improve specific areas of weakness.