←back to thread

ETH Zurich and EPFL to release a LLM developed on public infrastructure

(ethz.ch)

533 points andy99 | 1 comments | 11 Jul 25 18:45 UTC | HN request time: 0.254s | source

Show context

isusmelj ◴[11 Jul 25 20:30 UTC] No.44536509[source]▶

>>44535637 (OP) #

I hope they do well. AFAIK they’re training or finetuning an older LLaMA model, so performance might lag behind SOTA. But what really matters is that ETH and EPFL get hands-on experience training at scale. From what I’ve heard, the new AI cluster still has teething problems. A lot of people underestimate how tough it is to train models at this scale, especially on your own infra.

Disclaimer: I’m Swiss and studied at ETH. We’ve got the brainpower, but not much large-scale training experience yet. And IMHO, a lot of the “magic” in LLMs is infrastructure-driven.

replies(5): >>44536696 #>>44536809 #>>44537201 #>>44539869 #>>44541746 #

1. alfalfasprout ◴[11 Jul 25 22:01 UTC] No.44537201[source]▶

The infra does become pretty complex to get a SOTA LLM trained. People assume it's as simple as loading up the architecture and a dataset + using something like Ray. There's a lot that goes into designing the dataset, the eval pipelines, the training approach, maximizing the use of your hardware, dealing with cross-node latency, recovering from errors, etc.

But it's good to have more and more players in this space.