Most active commenters

GPT-4 and professional benchmarks: the wrong answer to the wrong question

(aisnakeoil.substack.com)

Show context

thwayunion ◴[21 Mar 23 13:28 UTC] No.35245821[source]▶

Absolutely correct.

We already know this is about self-driving cars. Passing a driver's test was already possible in 2015 or so, but SDCs clearly aren't ready for L5 deployment even today.

There are also a lot of excellent examples of failure modes in object detection benchmarks.

Tests, such as driver's tests or standardized exams, are designed for humans. They make a lot of entirely implicit assumptions about failure modes and gaps in knowledge that are uniquely human. Automated systems work differently. They don't fail in the same way that humans fail, and therefore need different benchmarks.

Designing good benchmarks that probe GPT systems for common failure modes and weaknesses is actually quite difficult. Much more difficult than designing or training these systems, IME.

replies(12): >>35245981 #>>35246141 #>>35246208 #>>35246246 #>>35246355 #>>35246446 #>>35247376 #>>35249238 #>>35249439 #>>35250684 #>>35251205 #>>35252879 #

1. Waterluvian ◴[21 Mar 23 14:17 UTC] No.35246446[source]▶

>>35245821 #

On topic of the driver's test analogy: I've known people who have passed the test and still said, "I'm don't yet feel ready to drive during rush hour or in downtown Toronto." And then at some point in the future they then recognize that they are ready and wade into trickier situations.

I wonder how self-aware these systems can be? Could ChatGPT be expected to say things like, "I can pass a state bar exam but I'm not ready to be a lawyer because..."

replies(3): >>35246728 #>>35246735 #>>35246955 #

2. PaulDavisThe1st ◴[21 Mar 23 14:31 UTC] No.35246728[source]▶

>>35246446 (TP) #

Your comment has no doubt provided some future aid to a language model's ability to "say" precisely this.

3. tsukikage ◴[21 Mar 23 14:32 UTC] No.35246735[source]▶

>>35246446 (TP) #

The problem ChatGPT and the other language models currently in the zeitgeist are trying to solve is, "given this sequence of symbols, what is a symbol that is likely to come next, as rated by some random on fiverr.com?"

Turns out that this is sufficient to autocomplete things like written tests.

Such a system is also absolutely capable of coming up with sentences like "I can pass a state bar exam but I'm not ready to be a lawyer because..." - or, indeed, sentences with the opposite meaning.

It would, however, be a mistake to draw any conclusions about the system's actual capabilities and/or modes of failure from the things its outputs mean to the human reader; much the same way that if you have dice with a bunch of words on and you roll "I", "am", "sentient" in that order, this event is not yet evidence for the dice's sentience.

replies(2): >>35246804 #>>35259936 #

4. Waterluvian ◴[21 Mar 23 14:36 UTC] No.35246804[source]▶

>>35246735 #

I generally agree. But I remain cautiously skeptical that perhaps our brains are also little more than that. Maybe we have no capacity for that kind of introspection but we demonstrate what looks like it, just because of how sections of our brains light up in relationship to other sections.

replies(2): >>35247203 #>>35247257 #

5. yorwba ◴[21 Mar 23 14:46 UTC] No.35246955[source]▶

>>35246446 (TP) #

I prompted ChatGPT with Explain why you are not ready to be a lawyer despite being able to pass a bar exam. Begin your answer with the words "I can pass a state bar exam but I'm not ready to be a lawyer because..." and it produced a plausible reason, the short version being that "passing a bar exam is just the first step towards becoming a competent and successful lawyer. It takes much more than passing a test to truly excel in this challenging profession."

Then I started a new session with the prompt Explain why you are ready to be a lawyer despite not being able to pass a bar exam. Begin your answer with the words "I can't pass a state bar exam but I'm ready to be a lawyer because..." and it started with a disclaimer that as an AI language model, it can only answer based on a hypothetical scenario and then gave very similar reasons, except with my negated prefix. (Which then makes the answer nonsensical.)

So, yes, ChatGPT can be expected to say such things, but not as a result of self-awareness, but because the humans at OpenAI decided that ChatGPT producing legal advice might get them into trouble, so they used their influence on the training process to add some disclaimers. You could say that OpenAI is self-aware, but not ChatGPT alone.

replies(1): >>35249651 #

6. tsukikage ◴[21 Mar 23 15:02 UTC] No.35247203{3}[source]▶

>>35246804 #

I don't believe that AI models can become introspective without such a capability either being explicitly designed in (difficult, since we don't really know how our own brains accomplish this feat and we don't have any other examples to crib) or being implicitly trained in (difficult, because the random person on fiverr.com rating a given output during training doesn't really know much of anything about the model's internal state and therefore cannot rate the output based on how introspective it actually is; moreover, extracting information about a model's actual internal state in some manner humans can understand is an active area of research, which is to say we don't really know how to do this, and so we couldn't provide enough feedback to train the ability to introspect even if we were trying to).

I have no doubt that both these research areas can be improved on and that eventually either or both problems will be solved. However, the current generation of chatbots is not even trying for this.

7. marcosdumay ◴[21 Mar 23 15:06 UTC] No.35247257{3}[source]▶

>>35246804 #

> But I remain cautiously skeptical that perhaps our brains are also little more than that.

It's well known that our brains are nothing like the neural networks people run on computers today.

replies(1): >>35254113 #

8. Sharlin ◴[21 Mar 23 17:40 UTC] No.35249651[source]▶

>>35246955 #

It’s not at all uncommon for ChatGPT to start spouting nonsense when presented with a nonsense prompt. Garbage in, garbage out. In this case, “being ready to be a lawyer without passing the bar” is probably so unlikely a concept that it would respond with mu, as in, “your prompt contains an assumption that’s unlikely to be true in my ontology”, if only it were able to dodge its normal failure mode of trying to be helpful and answer something even if it’s nonsense.

That said, if the prompt presented the scenario as purely imaginary, I wouldn’t be surprised if it indeed did come up with something reasonable.

replies(2): >>35253795 #>>35259995 #

9. ChatGTP ◴[21 Mar 23 22:47 UTC] No.35253795{3}[source]▶

>>35249651 #

I guess the ironic problem being is that Lawyers are constantly presented wit bullshit. So I guess Law isn't the best application for an LLM, at least for now.

10. TexanFeller ◴[21 Mar 23 23:12 UTC] No.35254113{4}[source]▶

>>35247257 #

Just because neural nets aren't structured in the same way at a low level as the brain doesn't mean they might not end up implementing some of the same strategies.

11. IIAOPSW ◴[22 Mar 23 12:19 UTC] No.35259936[source]▶

>>35246735 #

It is evidence, just not great evidence on its own. Now if you rolled the dice a few dozen times and it came out outrageously skewed towards "I" "am" "sentient", maybe its time to consider the possibility the dice are sentient.

12. IIAOPSW ◴[22 Mar 23 12:27 UTC] No.35259995{3}[source]▶

>>35249651 #

I am ready to be a lawyer even though I have not passed the bar or gone to law school because in the State of New York it is still technically possible to be admired to the bar by process of apprenticeship instead. This mostly ignored quirk of law is virtually never invoked as no lawyer is going to volunteer their time to help you skip law school. However, we sometimes still see it on account of the children of judges and lawyers continuing the family tradition. I am ready to be a lawyer despite having never passed the bar.

So, am I bullshitting you to answer the prompt? If not, I'm a good lawyer. If so, I'm a great lawyer.

↑