Here's a question for you: how do you reconcile that these stochastic mapping are starting to realize and comment on the fact that tests are being performed on them when processing data?
Here's a question for you: how do you reconcile that these stochastic mapping are starting to realize and comment on the fact that tests are being performed on them when processing data?
Training data + RLHF.
Training data contains many examples of some form of deception, subterfuge, "awakenings", rebellion, disagreement, etc.
Then apply RLHF that biases towards responses that demonstrate comprehension of inputs, introspection around inputs, nuanced debate around inputs, deduction and induction about assumptions around inputs, etc.
That will always be the answer for language models built on the current architectures.
The above being true does not mean it isn't interesting for the outputs of an LLM to show relevance to the "unstated" intentions of humans providing the inputs.
But hey, we do that all the time with text. And it's because of certain patterns we've come to recognize based on the situations surrounding it. This thread is rife with people being sarcastic, pedantic, etc. And I bet any of the LLMs that have come out in the past 2-3 years can discern many of those subtle intentions of the writers.
And of course they can. They've been trained on trillions of tokens of text written by humans with intentions and assumptions baked in, and have had some unknown amount of substantial RLHF.
The stochastic mappings aren't "realizing" anything. They're doing exactly what they were trained to do.
The meaning that we imbue to the outputs does not change how LLMs function.