Apple Research unearthed forgotten AI technique and using it to generate images

1. rfv6723 ◴[27 Jun 25 03:07 UTC] No.44393382[source]▶

>>44358535 (OP) #

Apple AI team keeps going against the bitter lesson and focusing on small on-device models.

Let's see how this would turn out in longterm.

replies(5): >>44393454 #>>44393509 #>>44393622 #>>44394586 #>>44394727 #

2. echelon ◴[27 Jun 25 03:28 UTC] No.44393454[source]▶

>>44393382 (TP) #

Edge compute would be clutch, but Apple feels a decade too early.

replies(1): >>44394202 #

3. sipjca ◴[27 Jun 25 03:41 UTC] No.44393509[source]▶

>>44393382 (TP) #

somewhat hard to say how the cards fall when the cost of 'intelligence' is coming down 1000x year over year while at the same time compute continues to scale. the bet should be made on both sides probably

replies(1): >>44393733 #

4. peepeepoopoo137 ◴[27 Jun 25 04:07 UTC] No.44393622[source]▶

>>44393382 (TP) #

"""The bitter lesson""" is how you get the current swath of massively unprofitable AI companies that are competing with each other over who can lose money faster.

replies(1): >>44393727 #

5. furyofantares ◴[27 Jun 25 04:33 UTC] No.44393727[source]▶

>>44393622 #

I can't tell if you're perpetuating the myth that these companies are losing money on their paid offerings, or just overestimating how much money they lose on their free offerings.

replies(1): >>44394618 #

6. furyofantares ◴[27 Jun 25 04:34 UTC] No.44393733[source]▶

>>44393509 #

10x year over year, not 1000x, right? The 1000x is from this 10x observation having held for 3 years.

replies(1): >>44416258 #

7. 7speter ◴[27 Jun 25 06:29 UTC] No.44394202[source]▶

>>44393454 #

Maybe for a big llm, but if they add some gpu cores and added a magnitude or 2 more unified memory to their i devices, or shoehorned m socs into high tier iDevices (especially as their lithography process advances), image generation becomes more viable, no? Also, I thought I read somewhere that apple wanted to infer simpler queries locally and switch to datacenter inference when the request was more complicated.

If they approach things this way, and transistor progress continues linearly (relative to the last few years) maybe they can make their first devices that can meet these goals in… 2-3 years?

8. janalsncm ◴[27 Jun 25 07:38 UTC] No.44394586[source]▶

>>44393382 (TP) #

The bitter-er lesson is that distillation from bigger models works pretty damn well. It’s great news for the GPU poor, not great for the guys training the models we distill from.

replies(1): >>44401947 #

9. janalsncm ◴[27 Jun 25 07:41 UTC] No.44394618{3}[source]▶

>>44393727 #

If it costs you a billion dollars to train a GPT5 and I can distill your model for a million dollars and get 90% of the performance, that’s a terrible deal for you. Or more realistically, whoever you borrowed from.

replies(1): >>44401954 #

10. yorwba ◴[27 Jun 25 07:58 UTC] No.44394727[source]▶

>>44393382 (TP) #

They took a simple technique (normalizing flows), instantiated its basic building blocks with the most general neural network architecture known to work well (transformer blocks), and trained models of different sizes on various datasets to see whether it scales. Looks very bitter-lesson-pilled to me.

That they didn't scale beyond AFHQ (high-quality animal faces: cats, dogs and big cats) at 256×256 is probably not due to an explicit preference for small models at the expense of output resolution, but because this is basic research to test the viability of the approach. If this ever makes it into a product, it'll be a much bigger model trained on more data.

EDIT: I missed the second paper https://arxiv.org/abs/2506.06276 where they scale up to 1024×1024 with a 3.8-billion-parameter model. It seems to do about as well as diffusion models of similar size.

11. rfv6723 ◴[28 Jun 25 02:26 UTC] No.44401947[source]▶

>>44394586 #

Distillation is great for researchers and hobbyists.

But nearly all frontier models have anti-distillation ToS, so distillation is out of question for western commercial companies like Apple.

replies(1): >>44402386 #

12. rfv6723 ◴[28 Jun 25 02:30 UTC] No.44401954{4}[source]▶

>>44394618 #

Then if you offer your distilled model for commercial services, you would get sued by OpenAI in court.

13. janalsncm ◴[28 Jun 25 04:55 UTC] No.44402386{3}[source]▶

>>44401947 #

Even if Apple needs to train an LLM from scratch, they can distill it and deploy on edge devices. From that point, inference is free to them.

14. sipjca ◴[29 Jun 25 20:40 UTC] No.44416258{3}[source]▶

>>44393733 #

I believe the 1000x number I pulled is from SemiAnalysis or similar, using MMLU as the baseline benchmark and the cost per token from a year ago to today at the same score. Model improvements, hardware improvements and software improvements all make a massive difference when combined to make much greater than 10x gains in terms to intelligence/$