GPT-4 and professional benchmarks: the wrong answer to the wrong question

1. macawfish ◴[21 Mar 23 13:51 UTC] No.35246072[source]▶

I disagree that these are "the wrong questions", but I do think we need to try and be nuanced about what these kinds of results actually mean.

The potential for these tools to impact labor markets is huge, no matter what they're "actually" or "essentially" capable of.

I'm a little tired of the arguments that the large language models are just regurgitating memorized output, I think it's now clear that higher level capabilities are emerging in these models and we need to take this seriously as a social/economic/political challenge.

This is "industrial revolution" level technology.

replies(2): >>35247306 #>>35249279 #

2. tarruda ◴[21 Mar 23 15:08 UTC] No.35247306[source]▶

>>35246072 (TP) #

> I think it's now clear that higher level capabilities are emerging in these models and we need to take this seriously as a social/economic/political challenge.

It is a hard truth to face. I admit I always feel a little bit of happiness when someone shows me a stupid error ChatGPT made, as if it would somehow invalidate all the awesome things it can do and the impact it will certainly have on all of us. What does it matter if ChatGPT is conscious or not when it can clearly automate a lot of work we previously considered to be creative?.

Since last year I started to seriously take a look at AI and started learning about LLMs. Until a few days ago I hadn't bought the explanation that these things are just predicting the next word, but I accepted it once I started running the Alpaca/Llama locally on my computer.

The concept of predicting words based on statistics seems simple, but clearly complex behavior emerges from it. Maybe our own intelligence emerges from simple primitives too?

replies(1): >>35250812 #

3. quantiq ◴[21 Mar 23 17:15 UTC] No.35249279[source]▶

>>35246072 (TP) #

>I'm a little tired of the arguments that the large language models are just regurgitating memorized output

The arguments are valid and you haven’t provided a single counterpoint. Data leakage is a well known problem in machine learning and OpenAI has seemingly done very little to mitigate against it.

replies(1): >>35253879 #

4. cubefox ◴[21 Mar 23 18:55 UTC] No.35250812[source]▶

>>35247306 #

One possible such simple primitive is predictive coding, where the brain is hypothesized to predict experience rather than text: https://slatestarcodex.com/2017/09/05/book-review-surfing-un...

5. macawfish ◴[21 Mar 23 22:54 UTC] No.35253879[source]▶

>>35249279 #

My point is that they're not _just simply regurgitating training data_ and it's reductionist to suggest that's all they do. I don't doubt there's plenty of contamination in OpenAI's models, and I don't doubt there's some level of regurgitation happening, but that's not all that's going on and we need to take seriously the possibility that LLMs, combined with well engineered prompts, can and/or will be able to tackle problems that aren't in their training data. Where do you even draw the line anyway?

The conversation about contamination (also very important) doesn't need to be mutually exclusive to conversations about social and economic impact, and I'm pretty sure with respect to those issues the results on standardized tests, however sensationalist, however containated, are an important wake-up call for ordinary people who haven't been following along. Something is happening now.