←back to thread

625 points lukebennett | 1 comments | | HN request time: 0.274s | source
Show context
iandanforth ◴[] No.42139410[source]
A few important things to remember here:

The best engineering minds have been focused on scaling transformer pre and post training for the last three years because they had good reason to believe it would work, and it has up until now.

Progress has been measured against benchmarks which are / were largely solvable with scale.

There is another emerging paradigm which is still small(er) scale but showing remarkable results. That's full multi-modal training with embodied agents (aka robots). 1x, Figure, Physical Intelligence, Tesla are all making rapid progress on functionality which is definitely beyond frontier LLMs because it is distinctly different.

OpenAI/Google/Anthropic are not ignorant of this trend and are also reviving or investing in robots or robot-like research.

So while Orion and Claude 3.5 opus may not be another shocking giant leap forward, that does not mean that there arn't giant shocking leaps forward coming from slightly different directions.

replies(9): >>42139779 #>>42139984 #>>42140069 #>>42140194 #>>42140421 #>>42141563 #>>42142249 #>>42142983 #>>42143148 #
mvdtnz ◴[] No.42142249[source]
> The best engineering minds have been focused on scaling transformer pre and post training for the last three years because they had good reason to believe it would work, and it has up until now.

Or because the people running companies who have fooled investors into believing it will work can afford to pay said engineers life-changing amounts of money.

replies(1): >>42149769 #
1. slashdave ◴[] No.42149769[source]
The improvements in transformer implementation (e.g. "Flash Attention") have saved gobs of money on training and inference, I am guessing most likely more than the salary of those researchers.