(arxiv.org)

248 points doener | 1 comments | 15 Apr 25 10:17 UTC | HN request time: 0.212s | source

Show context

ozgune ◴[15 Apr 25 12:12 UTC] No.43691597[source]▶

I had a related, but orthogonal question about multilingual LLMs.

When I ask smaller models a question in English, the model does well. When I ask the same model a question in Turkish, the answer is mediocre. When I ask the model to translate my question into English, get the answer, and translate the answer back to Turkish, the model again does well.

For example, I tried the above with Llama 3.3 70B, and asked it to plan me a 3-day trip to Istanbul. When I asked Llama to do the translations between English <> Turkish, the answer was notably better.

Anyone else observed a similar behavior?

replies(11): >>43691620 #>>43691751 #>>43691774 #>>43692427 #>>43692596 #>>43692803 #>>43692874 #>>43693906 #>>43695475 #>>43698229 #>>43698667 #

1. hnfong ◴[15 Apr 25 13:29 UTC] No.43692427[source]▶

>>43691597 #

I'd mentally put this in the same box as "chain of thought", where models perform better when explicitly describing the reasoning steps. The only difference in your case being that the model is undertrained in non-English data, so it's "next token prediction" of non-English prompts is less robust, and thus explicitly converting to English and then back makes it better.

This is probably the case for the "deep reasoning" models as well. If you for example try DeepSeek R1, it will likely reason in either English or Chinese (where it presumably is well trained) even if the prompt is in other languages.

↑

Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs (2024)