←back to thread

301 points SerCe | 1 comments | | HN request time: 0.322s | source
1. kittikitti ◴[] No.43114250[source]
These benchmarks are not really representative of what agents are capable of. The slow process of checking the weather through UI elements is not a good use case which is non-peer reviewed paper showcases.