←back to thread

AI agent benchmarks are broken

(ddkang.substack.com)
181 points neehao | 2 comments | | HN request time: 0.418s | source
1. RansomStark ◴[] No.44531891[source]
I really like the CMU Agents Company approach of simulating a real world environment [0]. Is it perfect, no. Does it show you want to expect in production, not really, but it's much closer than anything else I've seen.

[0] https://the-agent-company.com/

replies(1): >>44534952 #
2. yeahyeahok ◴[] No.44534952[source]
Damn. Super bullish on CMU. Somehow, they seem routinely left out of the top CS schools discussion at least in mainstream discourse: MIT, Stanford, Cal, .... Seen a disproportionate amount of stellar research come from there. Also, interestingly, I have met really incompetent people from all the other top 3 schools but am yet to meet an incompetent CMU SCS alum -- wtf are they feeding them in pitsburgh??