Something weird is happening with LLMs and chess

1. lukev ◴[15 Nov 24 01:36 UTC] No.42143161[source]▶

I don't necessarily believe this for a second but I'm going to suggest it because I'm feeling spicy.

OpenAI clearly downgrades some of their APIs from their maximal theoretic capability, for the purposes of response time/alignment/efficiency/whatever.

Multiple comments in this thread also say they couldn't reproduce the results for gpt3.5-turbo-instruct.

So what if the OP just happened to test at a time, or be IP bound to an instance, where the model was not nerfed? What if 3.5 and all subsequent OpenAI models can perform at this level but it's not strategic or cost effective for OpenAI to expose that consistently?

For the record, I don't actually believe this. But given the data it's a logical possibility.

replies(3): >>42143229 #>>42143264 #>>42144445 #

2. TZubiri ◴[15 Nov 24 01:48 UTC] No.42143229[source]▶

>>42143161 (TP) #

Stallman may have its flaws, but this is why serious research occurs with source code (or at least with binaries)

3. zeven7 ◴[15 Nov 24 01:55 UTC] No.42143264[source]▶

>>42143161 (TP) #

Why do you doubt it? I thought it was well known that Chat GPT has degraded over time for the same model, mostly for cost saving reasons.

replies(1): >>42143324 #

4. permo-w ◴[15 Nov 24 02:05 UTC] No.42143324[source]▶

>>42143264 #

ChatGPT is - understandably - blatantly different in the browser compared to the app, or it was until I deleted it anyway

replies(1): >>42143446 #

5. lukan ◴[15 Nov 24 02:28 UTC] No.42143446{3}[source]▶

>>42143324 #

I do not understand that. The app does not do any processing, just a UI to send text to and from the server.

replies(1): >>42144820 #

6. com2kid ◴[15 Nov 24 06:44 UTC] No.42144445[source]▶

>>42143161 (TP) #

> OpenAI clearly downgrades some of their APIs from their maximal theoretic capability, for the purposes of response time/alignment/efficiency/whatever.

When ChatGPT3.5 first came out, people were using it to simulate entire Linux system installs, and even browsing a simulated Internet.

Cool use cases like that aren't even discussed anymore.

I still wonder what sort of magic OpenAI had and then locked up away from the world in the name of cost savings.

Same thing with GPT 4 vs 4o, 4o is obviously worse in some ways, but after the initial release (when a bunch of people mentioned this), the issue has just been collectively ignored.

replies(2): >>42144529 #>>42146045 #

7. golol ◴[15 Nov 24 07:00 UTC] No.42144529[source]▶

>>42144445 #

You can still do this. People just lost interest in this stuff because it became clear to ehich degree the simulation is really being done (shallow).

Yet I do wish we had access to less finetuned/distilled/RLHF'd models.

8. isaacfrond ◴[15 Nov 24 08:05 UTC] No.42144820{4}[source]▶

>>42143446 #

There is a small difference between the app and the browser. before each session, the llm is started with a systems prompt. these are different for the app and the browser. You can find them online somewhere, but iirc the app is instructed to give shorter answers

replies(1): >>42149617 #

9. ipsum2 ◴[15 Nov 24 11:44 UTC] No.42146045[source]▶

>>42144445 #

People are doing this all the time with Claude 3.5.

10. bongodongobob ◴[15 Nov 24 18:39 UTC] No.42149617{5}[source]▶

>>42144820 #

Correct, it's different in a mobile browser too, the system prompt tells it to be brief/succinct. I always switch to desktop mode when using it on my phone.