(arxiv.org)

2 points badmonster | 2 comments | 11 Jul 25 18:02 UTC | HN request time: 0.458s | source

1. badmonster ◴[11 Jul 25 18:02 UTC] No.44535221[source]▶

a subtle but powerful insight: large multimodal models like CLIP don’t just learn individual concepts. they also depend heavily on how often those concepts appear together during training.

↑

Impact of Pretraining Word Co-Occurrence on Compositional Generalization In