(dynomight.substack.com)

696 points crescit_eundo | 2 comments | 14 Nov 24 17:05 UTC | HN request time: 0.419s | source

1. Peteragain ◴[15 Nov 24 08:08 UTC] No.42144837[source]▶

I would be interested to know if the good result is repeatable. We had a similar result with a quirky chat interface in that one run gave great results (and we kept the video) but then we couldn't do it again. The cynical among us think there was a mechanical turk involved in our good run. The economics of venture capital means that there is enormous pressure to justify techniques that we think of as "cheating". And of course the companies involved have the resources.

replies(1): >>42145676 #

2. tedsanders ◴[15 Nov 24 10:46 UTC] No.42145676[source]▶

>>42144837 (TP) #

It's repeatable. OpenAI isn't cheating.

Source: I'm at OpenAI and I was one of the first people to ever play chess against the GPT-4 base model. You may or may not trust OpenAI, but we're just a group of people trying earnestly to build cool stuff. I've never seen any inkling of an attempt to cheat evals or cheat customers.

↑

Something weird is happening with LLMs and chess