←back to thread

39 points suchintan | 3 comments | | HN request time: 0.21s | source
1. skull8888888 ◴[] No.42743888[source]
isn't browser use sota on web voyager? At this point web voyager seems to be outdated, there's def a need for a new harder benchmark.
replies(2): >>42744200 #>>42744614 #
2. suchintan ◴[] No.42744200[source]
Definitely need a newer benchmark.

I couldn't find where browser-use published their run results (expected to see it here https://github.com/browser-use/eval)

We went ahead and published our full run at https://eval.skyvern.com so our run could be independently audited

3. MagMueller ◴[] No.42744614[source]
Yes, this is the report of browser-use with 89%: https://browser-use.com/posts/sota-technical-report

We definitely need a new dataset with more complex tasks, like uploading files, handling multiple tabs, and handling many more steps.