←back to thread

129 points celias | 2 comments | | HN request time: 1.03s | source
Show context
rfv6723 ◴[] No.44393382[source]
Apple AI team keeps going against the bitter lesson and focusing on small on-device models.

Let's see how this would turn out in longterm.

replies(5): >>44393454 #>>44393509 #>>44393622 #>>44394586 #>>44394727 #
janalsncm ◴[] No.44394586[source]
The bitter-er lesson is that distillation from bigger models works pretty damn well. It’s great news for the GPU poor, not great for the guys training the models we distill from.
replies(1): >>44401947 #
1. rfv6723 ◴[] No.44401947[source]
Distillation is great for researchers and hobbyists.

But nearly all frontier models have anti-distillation ToS, so distillation is out of question for western commercial companies like Apple.

replies(1): >>44402386 #
2. janalsncm ◴[] No.44402386[source]
Even if Apple needs to train an LLM from scratch, they can distill it and deploy on edge devices. From that point, inference is free to them.