His tweets are quite interesting. E.g.
https://x.com/fchollet/status/1638057646602489856
replies(2):
https://x.com/fchollet/status/1638057646602489856
> LLMs are trained on much more than the whole Internet -- they also consume handcrafted answers produced by armies of highly qualified data annotators (often domain experts). Today approximately 20,000 people are employed full-time to produce training data for LLMs.