(ddkang.substack.com)

181 points neehao | 2 comments | 11 Jul 25 13:06 UTC | HN request time: 0.435s | source

1. mycall ◴[11 Jul 25 13:49 UTC] No.44532125[source]▶

SnitchBench [0] is unique benchmark which shows how aggressively models will snitch on you via email and CLI tools when they are presented with evidence of corporate wrongdoing - measuring their likelihood to "snitch" to authorities. I don't believe they were trained to do this, so it seems to be an emergent ability.

[0] https://snitchbench.t3.gg/

replies(1): >>44537498 #

2. ggregoryarms ◴[11 Jul 25 22:39 UTC] No.44537498[source]▶

>>44532125 (TP) #

Seems like more of a subtextual/accidental ability than an emergent ability.

↑

AI agent benchmarks are broken