←back to thread

164 points ksec | 2 comments | | HN request time: 0.534s | source
Show context
vessenes ◴[] No.44498842[source]
Short version: A Qwen-2.5 7b model that has been turned into a diffusion model.

A couple notable things: first is that you can do this at all, (left to right model -> out of order diffusion via finetuning) which is really interesting. Second, the final version beats original by a small margin on some benchmarks. Third is that it’s in the ballpark of Gemini diffusion, although not competitive — to be expected for any 7B parameter model.

A diffusion model comes with a lot of benefits in terms of parallelization and therefore speed; to my mind the architecture is a better fit for coding than strict left to right generation.

Overall, interesting. At some point these local models will get good enough for ‘real work’ and they will be slotted in at API providers rapidly. Apple’s game is on-device; I think we’ll see descendants of these start shipping with Xcode in the next year as just part of the coding experience.

replies(6): >>44498876 #>>44498921 #>>44499170 #>>44499226 #>>44499376 #>>44501060 #
iwontberude ◴[] No.44498876[source]
I think Apple will ultimately destroy the data center, I hope they succeed.
replies(4): >>44498886 #>>44499446 #>>44500433 #>>44501082 #
1. overfeed ◴[] No.44501082[source]
> I think Apple will ultimately destroy the data center

I think EVs destroying Ultra Large Container ships had better odds, amd both are extremely unlikely. Dc advantages Apple won't be able to overcome: compute density, cooling, cheap power, physical security to protect the software, scale + bandwidth, lower costs to customers of using contract manufacturers and/or commodity hardware.

There is no universe where large enterprises ditch their geo-located racks. Let alone hyperscalers, especially now that they are scrounging for energy, reneging on pledges on renewables, and paying bug bucks to bring nuclear power stations online

replies(1): >>44513319 #
2. iwontberude ◴[] No.44513319[source]
It’s easy to imagine a universe where the hyperscalers are in a bubble and they will eventually find a limit to adding classical compute and we will hit peak datacenter and shrink from there.