I don't think one model is statistically significant. As people have pointed out, it could have chess specific responses that the others do not. There should be at least another one or two, preferably unrelated, "good" data points before you can claim there is a pattern. Also, where's Claude?
replies(1):