ETH Zurich and EPFL to release a LLM developed on public infrastructure

(ethz.ch)

511 points andy99 | 2 comments | 11 Jul 25 18:45 UTC | HN request time: 0.549s | source

Show context

isusmelj ◴[11 Jul 25 20:30 UTC] No.44536509[source]▶

I hope they do well. AFAIK they’re training or finetuning an older LLaMA model, so performance might lag behind SOTA. But what really matters is that ETH and EPFL get hands-on experience training at scale. From what I’ve heard, the new AI cluster still has teething problems. A lot of people underestimate how tough it is to train models at this scale, especially on your own infra.

Disclaimer: I’m Swiss and studied at ETH. We’ve got the brainpower, but not much large-scale training experience yet. And IMHO, a lot of the “magic” in LLMs is infrastructure-driven.

replies(5): >>44536696 #>>44536809 #>>44537201 #>>44539869 #>>44541746 #

lllllm ◴[12 Jul 25 06:39 UTC] No.44539869[source]▶

>>44536509 #

No, the model has nothing do to with Llama. We are using our own architecture, and training from scratch. Llama also does not have open training data, and is non-compliant, in contrast to this model.

Source: I'm part of the training team

replies(6): >>44539877 #>>44540067 #>>44540272 #>>44540736 #>>44540850 #>>44540873 #

1. danielhanchen ◴[12 Jul 25 07:31 UTC] No.44540067[source]▶

>>44539869 #

If you guys need help on GGUFs + Unsloth dynamic quants + finetuning support via Unsloth https://github.com/unslothai/unsloth on day 0 / 1, more than happy to help :)

replies(1): >>44540233 #

2. lllllm ◴[12 Jul 25 08:11 UTC] No.44540233[source]▶

>>44540067 (TP) #

absolutely! i've sent you a linkedin message last week. but here seems to work much better, thanks a lot!

↑