←back to thread

321 points denysvitali | 1 comments | | HN request time: 0s | source
Show context
cmdrk ◴[] No.45112073[source]
Does their training corpus respect copyrights or do you have to follow their opt out procedure to keep them from consuming your data? Assuming it’s the latter, it’s open-er but still not quite there.
replies(2): >>45142143 #>>45142449 #
1. SparkyMcUnicorn ◴[] No.45142449[source]
Your question is addressed in opening abstract: https://github.com/swiss-ai/apertus-tech-report/raw/refs/hea...

> Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for copyrighted, non-permissive, toxic, and personally identifiable content.