ETH Zurich and EPFL to release a LLM developed on public infrastructure

(ethz.ch)

604 points andy99 | 2 comments | 11 Jul 25 18:45 UTC | HN request time: 0.002s | source

Show context

isusmelj ◴[11 Jul 25 20:30 UTC] No.44536509[source]▶

I hope they do well. AFAIK they’re training or finetuning an older LLaMA model, so performance might lag behind SOTA. But what really matters is that ETH and EPFL get hands-on experience training at scale. From what I’ve heard, the new AI cluster still has teething problems. A lot of people underestimate how tough it is to train models at this scale, especially on your own infra.

Disclaimer: I’m Swiss and studied at ETH. We’ve got the brainpower, but not much large-scale training experience yet. And IMHO, a lot of the “magic” in LLMs is infrastructure-driven.

replies(5): >>44536696 #>>44536809 #>>44537201 #>>44539869 #>>44541746 #

asjir ◴[12 Jul 25 12:58 UTC] No.44541746[source]▶

>>44536509 #

I'd be more concerned about the size used being 70b (deepseek r1 has 671b) which makes catching up with SOTA kinda more difficult to begin with.

replies(1): >>44541843 #

1. zettabomb ◴[12 Jul 25 13:12 UTC] No.44541843{3}[source]▶

>>44541746 #

SOTA performance is relative to model size. If it performs better than other models in the 70B range (e.g. Llama 3.3) then it could be quite useful. Not everyone has the VRAM to run the full fat Deepseek R1.

replies(1): >>44543323 #

2. tough ◴[12 Jul 25 16:58 UTC] No.44543323[source]▶

>>44541843 (TP) #

also isn't DeepSeek's Mixture of Experts? meaning not all params get ever activated on one forward pass?

70B feels like the best balance between usable locally and decent for regular use.

maybe not SOTA, but a great first step.

↑