←back to thread

164 points ksec | 1 comments | | HN request time: 0.001s | source
Show context
vessenes ◴[] No.44498842[source]
Short version: A Qwen-2.5 7b model that has been turned into a diffusion model.

A couple notable things: first is that you can do this at all, (left to right model -> out of order diffusion via finetuning) which is really interesting. Second, the final version beats original by a small margin on some benchmarks. Third is that it’s in the ballpark of Gemini diffusion, although not competitive — to be expected for any 7B parameter model.

A diffusion model comes with a lot of benefits in terms of parallelization and therefore speed; to my mind the architecture is a better fit for coding than strict left to right generation.

Overall, interesting. At some point these local models will get good enough for ‘real work’ and they will be slotted in at API providers rapidly. Apple’s game is on-device; I think we’ll see descendants of these start shipping with Xcode in the next year as just part of the coding experience.

replies(6): >>44498876 #>>44498921 #>>44499170 #>>44499226 #>>44499376 #>>44501060 #
jeswin ◴[] No.44498921[source]
> to my mind the architecture is a better fit for coding

We have to see if it produces better results. Humans have a planning phase, followed be a part-by-part implementation phase. This is reasonably well emulated by plan/architect + codegen tools.

replies(1): >>44499629 #
dboreham ◴[] No.44499629[source]
It's delusional to think that most software projects can be planned in advance beyond "there will be a beginning, a middle, and an end". People do it, but their efforts are in my experience generally ignored once implementation get underway.
replies(3): >>44500034 #>>44500765 #>>44501077 #
lokar ◴[] No.44501077[source]
That’s true at the project level. But surely when you sit down to actually work for a couple hours you think about what you are going to do, and then mostly do that.
replies(1): >>44501570 #
1. layer8 ◴[] No.44501570[source]
In my experience it’s more fractal. Any subgoal, however small, may run into its own planning/thinking and then doing sequence, or even have you reconsider the higher-level plan. Of course, it somewhat depends on how run-of-the-mill the overall task is.