Does their training corpus respect copyrights or do you have to follow their opt out procedure to keep them from consuming your data? Assuming it’s the latter, it’s open-er but still not quite there.
replies(2):
> Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for copyrighted, non-permissive, toxic, and personally identifiable content.