←back to thread

579 points paulpauper | 1 comments | | HN request time: 0.211s | source
1. dcl ◴[] No.43607858[source]
I like this bit:

> Personally, when I want to get a sense of capability improvements in the future, I'm going to be looking almost exclusively at benchmarks like Claude Plays Pokemon.

Definitely interested to see how the best models from Anthropics competitors do at this.,