←back to thread

586 points andy99 | 2 comments | | HN request time: 0.44s | source
Show context
isusmelj ◴[] No.44536509[source]
I hope they do well. AFAIK they’re training or finetuning an older LLaMA model, so performance might lag behind SOTA. But what really matters is that ETH and EPFL get hands-on experience training at scale. From what I’ve heard, the new AI cluster still has teething problems. A lot of people underestimate how tough it is to train models at this scale, especially on your own infra.

Disclaimer: I’m Swiss and studied at ETH. We’ve got the brainpower, but not much large-scale training experience yet. And IMHO, a lot of the “magic” in LLMs is infrastructure-driven.

replies(5): >>44536696 #>>44536809 #>>44537201 #>>44539869 #>>44541746 #
lllllm ◴[] No.44539869[source]
No, the model has nothing do to with Llama. We are using our own architecture, and training from scratch. Llama also does not have open training data, and is non-compliant, in contrast to this model.

Source: I'm part of the training team

replies(6): >>44539877 #>>44540067 #>>44540272 #>>44540736 #>>44540850 #>>44540873 #
1. Al-Khwarizmi ◴[] No.44540736[source]
So you're not going to use copyrighted data for training? That's going to be a disadvantage with respect to LLaMa and other well-known models, it's an open secret that everyone is using everything they can get their hands on.

Good luck though, very needed project!

replies(1): >>44540875 #
2. badsectoracula ◴[] No.44540875[source]
Not sure about the Swiss laws, but the EU AI Act and the 2019/790 digital millennium directive it piggies back on the topic, does allow for training on copyrighted data as long as any opt-out mechanisms (e.g. robots.txt) are respected. AFAICT this LLM was trained by respecting those mechanisms (and as linked elsewhere they didn't find any practical difference in performance - note that there is an exception to allow ignoring the opt-out mechanisms for research purposes, so they could make that comparison).