←back to thread

246 points doener | 1 comments | | HN request time: 0s | source
Show context
ozgune ◴[] No.43691597[source]
I had a related, but orthogonal question about multilingual LLMs.

When I ask smaller models a question in English, the model does well. When I ask the same model a question in Turkish, the answer is mediocre. When I ask the model to translate my question into English, get the answer, and translate the answer back to Turkish, the model again does well.

For example, I tried the above with Llama 3.3 70B, and asked it to plan me a 3-day trip to Istanbul. When I asked Llama to do the translations between English <> Turkish, the answer was notably better.

Anyone else observed a similar behavior?

replies(11): >>43691620 #>>43691751 #>>43691774 #>>43692427 #>>43692596 #>>43692803 #>>43692874 #>>43693906 #>>43695475 #>>43698229 #>>43698667 #
petesergeant ◴[] No.43691620[source]
Fascinating phenomenon. It's like a new Sapir–Whorf hypothesis. Do language models act differently in different languages due to those languages or the training materials?
replies(3): >>43691662 #>>43691782 #>>43692839 #
evgen ◴[] No.43691782[source]
This is one of those subtle clues that the LLM does not actually 'know' anything. It is providing you the best consensus answer to your prompt using the data upon which the weights rest, is that data was input primarily as english then you are going to get better results asking in english. It is still Searle's Chinese Room except you need to first go to the 'Language X -> English' room and then deliver its output to the general query room before delivering the next result to the 'English -> Language X' room.
replies(6): >>43691852 #>>43691872 #>>43691873 #>>43692157 #>>43692672 #>>43692745 #
1. vjerancrnjak ◴[] No.43692157[source]
Exactly. I found it surprising how soon it was implied "Imagine you're the smartest and most creative person in the world, ..." would somehow result in the most creative output.

It's clear from the start that language modelling is not yet there. It can't reason about low level structure (letters, syllables, rhyme, rhythm), it can't map all languages to a singular clear representation. Representation is mushy distributed mess out of which you get good or bad results.

It's brilliant how relevant the responses are and when they're correct, but the underlying process is driven by very weird internal representations.