(ddkang.substack.com)

181 points neehao | 1 comments | 11 Jul 25 13:06 UTC | HN request time: 0.209s | source

Show context

mycall ◴[11 Jul 25 13:49 UTC] No.44532125[source]▶

SnitchBench [0] is unique benchmark which shows how aggressively models will snitch on you via email and CLI tools when they are presented with evidence of corporate wrongdoing - measuring their likelihood to "snitch" to authorities. I don't believe they were trained to do this, so it seems to be an emergent ability.

[0] https://snitchbench.t3.gg/

replies(1): >>44537498 #

1. ggregoryarms ◴[11 Jul 25 22:39 UTC] No.44537498[source]▶

>>44532125 #

Seems like more of a subtextual/accidental ability than an emergent ability.

↑

AI agent benchmarks are broken