←back to thread

246 points doener | 1 comments | | HN request time: 0.282s | source
Show context
ozgune ◴[] No.43691597[source]
I had a related, but orthogonal question about multilingual LLMs.

When I ask smaller models a question in English, the model does well. When I ask the same model a question in Turkish, the answer is mediocre. When I ask the model to translate my question into English, get the answer, and translate the answer back to Turkish, the model again does well.

For example, I tried the above with Llama 3.3 70B, and asked it to plan me a 3-day trip to Istanbul. When I asked Llama to do the translations between English <> Turkish, the answer was notably better.

Anyone else observed a similar behavior?

replies(11): >>43691620 #>>43691751 #>>43691774 #>>43692427 #>>43692596 #>>43692803 #>>43692874 #>>43693906 #>>43695475 #>>43698229 #>>43698667 #
mrweasel ◴[] No.43691751[source]
Someone apparently did observe ChatGPT (I think it was ChatGPT) switch to Chinese for some parts of it's reasoning/calculations and then back to English for the final answer. That's somehow even weirder than the LLM giving different answers depending on the input.
replies(6): >>43692168 #>>43692261 #>>43692574 #>>43693043 #>>43695468 #>>43695859 #
maxloh ◴[] No.43692261[source]
> the LLM giving different answers depending on the input.

LLMs are actually designed to have some randomness in their responses.

To make the answer reproducible, set the temperature to O (eliminating randomness) and provide a static seed (ensuring consistent results) in the LLM's configuration.

replies(2): >>43692350 #>>43692529 #
jll29 ◴[] No.43692529[source]
The influence of the (pseudo-)random number generator is called "temperature" in most models.

Setting it to 0 in theory eliminates all randomness, and instead of choosing one from a list of next words that may be predicted, always only the MOST PROBABLY word would be chosen.

However, in practice, setting the temperature to 0 in most GUIs does not actually set the temperature to 0, but to a "very small" value ("epsilon"), the reason being to avoid a division by zero exception/crawsh in a mathematical formula. So don't be surprised if you cannot get rid of random behavior entirely.

replies(1): >>43696479 #
1. Asraelite ◴[] No.43696479[source]
> the reason being to avoid a division by zero exception/crawsh in a mathematical formula

Why don't they just special-case it?