(huggingface.co)

321 points denysvitali | 1 comments | 02 Sep 25 20:14 UTC | HN request time: 0s | source

Show context

cmdrk ◴[03 Sep 25 03:44 UTC] No.45112073[source]▶

Does their training corpus respect copyrights or do you have to follow their opt out procedure to keep them from consuming your data? Assuming it’s the latter, it’s open-er but still not quite there.

replies(2): >>45142143 #>>45142449 #

1. SparkyMcUnicorn ◴[05 Sep 25 19:13 UTC] No.45142449[source]▶

>>45112073 #

Your question is addressed in opening abstract: https://github.com/swiss-ai/apertus-tech-report/raw/refs/hea...

> Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for copyrighted, non-permissive, toxic, and personally identifiable content.

↑

Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS