(huggingface.co)

321 points denysvitali | 1 comments | 02 Sep 25 20:14 UTC | HN request time: 0s | source

Show context

lllllm ◴[05 Sep 25 22:33 UTC] No.45144461[source]▶

>>45108401 (OP) #

martin here from the apertus team, happy to answer any questions if i can.

the full collection of models is here: https://huggingface.co/collections/swiss-ai/apertus-llm-68b6...

PS: you can run this locally on your mac with this one-liner:

pip install mlx-lm

mlx_lm.generate --model mlx-community/Apertus-8B-Instruct-2509-8bit --prompt "who are you?"

replies(2): >>45147079 #>>45147974 #

trcf22 ◴[06 Sep 25 06:23 UTC] No.45147079[source]▶

>>45144461 #

Great job! Would it be possible to know what was the cost of training such a model?

replies(1): >>45147663 #

1. menaerus ◴[06 Sep 25 08:49 UTC] No.45147663[source]▶

>>45147079 #

From their report:

> Once a production environment has been set up, we estimate that the model can be realistically trained in approximately 90 days on 4096 GPUs, accounting for overheads. If we assume 560 W power usage per Grace-Hopper module in this period, below the set power limit of 660 W, we can estimate 5 GWh power usage for the compute of the pretraining run.

↑

Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS