Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs (2024)

(arxiv.org)

248 points doener | 1 comments | 15 Apr 25 10:17 UTC | HN request time: 0.207s | source

Show context

ozgune ◴[15 Apr 25 12:12 UTC] No.43691597[source]▶

I had a related, but orthogonal question about multilingual LLMs.

When I ask smaller models a question in English, the model does well. When I ask the same model a question in Turkish, the answer is mediocre. When I ask the model to translate my question into English, get the answer, and translate the answer back to Turkish, the model again does well.

For example, I tried the above with Llama 3.3 70B, and asked it to plan me a 3-day trip to Istanbul. When I asked Llama to do the translations between English <> Turkish, the answer was notably better.

Anyone else observed a similar behavior?

replies(11): >>43691620 #>>43691751 #>>43691774 #>>43692427 #>>43692596 #>>43692803 #>>43692874 #>>43693906 #>>43695475 #>>43698229 #>>43698667 #

petesergeant ◴[15 Apr 25 12:15 UTC] No.43691620[source]▶

>>43691597 #

Fascinating phenomenon. It's like a new Sapir–Whorf hypothesis. Do language models act differently in different languages due to those languages or the training materials?

replies(3): >>43691662 #>>43691782 #>>43692839 #

shaky-carrousel ◴[15 Apr 25 12:18 UTC] No.43691662[source]▶

>>43691620 #

They absolutely do. They know more in English than in Spanish, I've seen that on all models, since the beginning.

replies(1): >>43697719 #

namaria ◴[15 Apr 25 20:05 UTC] No.43697719[source]▶

>>43691662 #

They have more data in English than Spanish. LLMs don't know or reason or follow instructions. They merely render text continuations that are coherent with the expectations you set when prompting. The fact that they are not able to sustain the illusion in languages with less available training data than English should make that clear.

replies(1): >>43699837 #

shaky-carrousel ◴[15 Apr 25 23:55 UTC] No.43699837[source]▶

>>43697719 #

> They have more data in English than Spanish.

Yep, that there seems like the definition of knowing. Don't worry, your humanity isn't at risk.

replies(1): >>43701737 #

namaria ◴[16 Apr 25 05:15 UTC] No.43701737[source]▶

>>43699837 #

No, mental models matter. This has nothing to do with AGI doomerism.

Knowing implies reasoning. LLMs don't "know" things. These statistical models continuate text. Having a mental model that they "know" things, that they can "reason" or "follow instructions" is driving all sorts of poor decisions.

Software has an abstraction fetish. So much of the material available for learners is riddled with analogies and "you don't need to know that" attitude. That is counter productive and I think having accurate mental models matters.

replies(2): >>43701833 #>>43710415 #

1. shaky-carrousel ◴[16 Apr 25 21:09 UTC] No.43710415[source]▶

>>43701737 #

I have to disagree. We've been using "knowing" for programs for decades without requiring it to imply reasoning. Just because the output now looks more realistic doesn't mean we need to suddenly get philosophical about it. That shift says more about us than about the software.

And while accurate mental models can help in certain contexts, they're not always necessary. I don't need a detailed model of how my OS handles file operations to use it effectively. A high-level understanding is usually enough. Insisting on deep internal accuracy in every case seems more like gatekeeping than good practice.

↑