(huggingface.co)

321 points denysvitali | 4 comments | 02 Sep 25 20:14 UTC | HN request time: 0s | source

Show context

dcreater ◴[05 Sep 25 21:32 UTC] No.45143868[source]▶

I want and hope this to succeed. But the tea leaves don't look good at the moment:

- model sizes that the industry was at 2-3 gens ago (llama 3.1 era) - Conspicuous lack of benchmark results in announcements - not on openrouter, no ggufs as yet

replies(1): >>45143911 #

lllllm ◴[05 Sep 25 21:36 UTC] No.45143911[source]▶

>>45143868 #

benchmarks: we provide plenty in the over 100 page tech report here https://github.com/swiss-ai/apertus-tech-report/blob/main/Ap...

quantizations: available now in MLX https://github.com/ml-explore/mlx-lm (gguf coming soon, not trivial due to new architecture)

model sizes: still many good dense models today lie in the range between our small and large chosen sizes

replies(1): >>45144106 #

1. dcreater ◴[05 Sep 25 21:55 UTC] No.45144106[source]▶

>>45143911 #

Thank you! Why are the comparisons to llama3.1 era models?

replies(1): >>45144417 #

2. lllllm ◴[05 Sep 25 22:28 UTC] No.45144417[source]▶

>>45144106 (TP) #

we compared to GPT-OSS-20B, Llama 4, Qwen 3, among many others. Which models do you think are missing, among open weights and fully-open models?

Note that we have a specific focus on multilinguality (over 1000 languages supported), not only on english

replies(2): >>45145007 #>>45146593 #

3. kamranjon ◴[05 Sep 25 23:39 UTC] No.45145007[source]▶

>>45144417 #

How did it compare with Gemma 3 models? I’ve been impressed with Gemma 27b - but I try out local models frequently and I’m excited to boot up your 70b model on my 128gb MacBook Pro when I get home!

4. dcreater ◴[06 Sep 25 04:24 UTC] No.45146593[source]▶

>>45144417 #

ah im sorry, I missed that - im not that blind usually..

↑

Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS