←back to thread

514 points andy99 | 1 comments | | HN request time: 0.208s | source
Show context
isusmelj ◴[] No.44536509[source]
I hope they do well. AFAIK they’re training or finetuning an older LLaMA model, so performance might lag behind SOTA. But what really matters is that ETH and EPFL get hands-on experience training at scale. From what I’ve heard, the new AI cluster still has teething problems. A lot of people underestimate how tough it is to train models at this scale, especially on your own infra.

Disclaimer: I’m Swiss and studied at ETH. We’ve got the brainpower, but not much large-scale training experience yet. And IMHO, a lot of the “magic” in LLMs is infrastructure-driven.

replies(5): >>44536696 #>>44536809 #>>44537201 #>>44539869 #>>44541746 #
lllllm ◴[] No.44539869[source]
No, the model has nothing do to with Llama. We are using our own architecture, and training from scratch. Llama also does not have open training data, and is non-compliant, in contrast to this model.

Source: I'm part of the training team

replies(6): >>44539877 #>>44540067 #>>44540272 #>>44540736 #>>44540850 #>>44540873 #
macawfish ◴[] No.44539877[source]
Are you using dbpedia?
replies(1): >>44539987 #
lllllm ◴[] No.44539987[source]
no. the main source is fineweb2, but with additional filtering for compliance, toxicity removal, and quality filters such as fineweb2-hq
replies(1): >>44540171 #
PeterStuer ◴[] No.44540171[source]
Thx for engaging here.

Can you comment on how the filtering impacted language coverage? E.g. finweb2 has 1800+ languages, but some with very little actual representation, while finweb2-hq has just 20 but each with a subdsantial data set.

(I'm personaly most interested in covering the 24 official EU languages)

replies(1): >>44540219 #
1. lllllm ◴[] No.44540219[source]
we kept all 1800+ (script/language) pairs, not only the quality filtered ones. the question if a mix of quality filtered and not languages impacts the mixing is still an open question. preliminary research (Section 4.2.7 of https://arxiv.org/abs/2502.10361 ) indicates that quality filtering can mitigate the curse of multilinguality to some degree, so facilitate cross-lingual generalization, but it has to be seen how strong this effect is on larger scale