Most active commenters
  • lxgr(8)
  • hnaccount_rng(4)
  • iwontberude(3)
  • miroljub(3)
  • GeekyBear(3)

←back to thread

164 points ksec | 48 comments | | HN request time: 1.975s | source | bottom
1. vessenes ◴[] No.44498842[source]
Short version: A Qwen-2.5 7b model that has been turned into a diffusion model.

A couple notable things: first is that you can do this at all, (left to right model -> out of order diffusion via finetuning) which is really interesting. Second, the final version beats original by a small margin on some benchmarks. Third is that it’s in the ballpark of Gemini diffusion, although not competitive — to be expected for any 7B parameter model.

A diffusion model comes with a lot of benefits in terms of parallelization and therefore speed; to my mind the architecture is a better fit for coding than strict left to right generation.

Overall, interesting. At some point these local models will get good enough for ‘real work’ and they will be slotted in at API providers rapidly. Apple’s game is on-device; I think we’ll see descendants of these start shipping with Xcode in the next year as just part of the coding experience.

replies(6): >>44498876 #>>44498921 #>>44499170 #>>44499226 #>>44499376 #>>44501060 #
2. iwontberude ◴[] No.44498876[source]
I think Apple will ultimately destroy the data center, I hope they succeed.
replies(4): >>44498886 #>>44499446 #>>44500433 #>>44501082 #
3. nxobject ◴[] No.44498886[source]
Shades of 1980s Apple v. Big Blue. I can't wait for the rehash of the "1984" ad.
4. jeswin ◴[] No.44498921[source]
> to my mind the architecture is a better fit for coding

We have to see if it produces better results. Humans have a planning phase, followed be a part-by-part implementation phase. This is reasonably well emulated by plan/architect + codegen tools.

replies(1): >>44499629 #
5. baobun ◴[] No.44499170[source]
Without having tried it, what I keep getting surprised with is how apparently widely different architectures (and in other cases training data) lead to very similar outcomes. I'd expect results to vary a lot more.
replies(3): >>44499473 #>>44499659 #>>44500645 #
6. koakuma-chan ◴[] No.44499226[source]
> At some point these local models will get good enough for ‘real work’

Are these small models good enough for anything but autocomplete?

replies(3): >>44499252 #>>44499320 #>>44503219 #
7. _heimdall ◴[] No.44499252[source]
Isn't that all they're designed for?

They predict more than just the second half of a word you are typing, but at the end of the day they're still just predicting what a human would have typed.

replies(1): >>44499283 #
8. koakuma-chan ◴[] No.44499283{3}[source]
I'm disappointed because I don't use autocomplete.
9. MangoToupe ◴[] No.44499320[source]
Given that's 99% of my usage of it, that alone would make me quite happy.
10. miroljub ◴[] No.44499376[source]
When we look at the small models suitable for running locally, by far the best programming model is DeepSeek-R1-0528-Qwen3-8B. It is quite comparable in real world usage even to much bigger models.
replies(2): >>44500751 #>>44501028 #
11. lxgr ◴[] No.44499446[source]
Maybe for compute, but not for storage.

Why can’t I backup an iOS device to a local NAS in the way I can use Time Machine, for example? (Rhetorical question; the answer is obviously that they want to sell more iCloud storage for that all-important services revenue).

replies(3): >>44499534 #>>44499805 #>>44499924 #
12. viraptor ◴[] No.44499473[source]
It doesn't look like it got pushed that much unfortunately. The article says they only added 20k examples to fine tune at the end, but maybe the ceiling is much higher for diffusion?

But yeah, RWKV also ends up in a similar performance area with similar sizes - I wish someone started using it at scale finally...

13. hiatus ◴[] No.44499534{3}[source]
Synology supports exactly that, and I'm sure they're not the only one.
replies(1): >>44500145 #
14. dboreham ◴[] No.44499629[source]
It's delusional to think that most software projects can be planned in advance beyond "there will be a beginning, a middle, and an end". People do it, but their efforts are in my experience generally ignored once implementation get underway.
replies(3): >>44500034 #>>44500765 #>>44501077 #
15. IMTDb ◴[] No.44499659[source]
I would expect a lot of attempts to fail and those tend to not be published, or gather less attention. So if we have reached a local optimum, any technique that gets close to the current benchmarks is worth publishing, as soon as results reach that point. All the one that are too distant are discarded. In the end all the paper you see are close to the current status quo.

It's possible that some of those new architecture / optimization would allow us to go beyond the current benchmark score, but probably with more training data, and money. But to get money you need to show results, which is what you see today. Scaling remains king; maybe one of these technique is 2025 "attention" paper, but even that one needed a lot of scaling to go from the 2017 version to ChatGPT.

16. tonyedgecombe ◴[] No.44499805{3}[source]
>Why can’t I backup an iOS device to a local NAS

You can backup your iPhone using Finder.

Finder -> Locations -> Your iPhone -> Backup all the data on this iPhone to your Mac.

Once you have done this you can find the backup in "Manage Backups", right click on an entry and select "Show in Finder". From there you can copy it to your NAS.

Not as smooth as a Time Machine backup but it is possible.

replies(1): >>44500131 #
17. throw0101d ◴[] No.44499924{3}[source]
> Why can’t I backup an iOS device to a local NAS in the way I can use Time Machine, for example?

When I connect my iPhone to my iMac it does to a local backup to a file, which then gets backed up via Time Machine (and SuperDuper/CarbonCopyCloner).

"How to back up your iPhone, iPad, and iPod touch with your Mac":

* https://support.apple.com/en-ca/108796

There's also a checkbox for 'Wifi syncing' so a cable isn't necessarily needed.

replies(1): >>44500112 #
18. Retric ◴[] No.44500034{3}[source]
Planning in software isn’t about following the plan but mapping a viable route to avoid predictable issues. You’re always going to know more about a project as you build it and you should keep updating that plan.
19. lxgr ◴[] No.44500112{4}[source]
That’s exactly my point: Why on Earth do I need a separate computer to mediate the backup?

iOS natively supports SMB over any network connection including wired Ethernet, mounting encrypted APFS volumes on USB storage devices at 10 Gbps etc.

It’s Apples explicit vision that an iPad Pro can replace a Mac even for some professional users. Why don’t these deserve local backups?

replies(1): >>44500426 #
20. lxgr ◴[] No.44500131{4}[source]
> Not as smooth as a Time Machine backup but it is possible

I’d personally call it “absurdly clunky and intentionally impractical for a big chunk of Apple’s user base”.

21. lxgr ◴[] No.44500145{4}[source]
Full iOS backups directly to local external storage, without another computer in the mix? I’d be very surprised if that were true.
replies(1): >>44500329 #
22. GeekyBear ◴[] No.44500329{5}[source]
Here's one example of a third party tool.

> Step-by-Step Guide: How to Backup iPhone to Synology NAS

https://www.ubackup.com/phone-backup/backup-iphone-to-synolo...

replies(2): >>44500483 #>>44500496 #
23. GeekyBear ◴[] No.44500426{5}[source]
How many people own a NAS, but not a PC or Mac?

Apple already provides first party software to handle iDevice backups on Windows or Mac.

Backing up an Android device to a PC using adb is significantly more difficult, especially for the less technically minded.

replies(1): >>44500513 #
24. msgodel ◴[] No.44500433[source]
Not without fundamentally changing the way they think about computing and there seems to be zero willingness among their leadership to do that. In fact they seem to want to move things into the data center. That's why I'm shorting them.
replies(1): >>44500896 #
25. oefrha ◴[] No.44500483{6}[source]
Did you actually read what you linked, or did you just paste in a random link from a search engine?

There are two methods presented: one only backs up the camera roll; the other requires plugging into a computer and manually clicking around, at which point you might as well use the first party backup built into Finder (or iTunes on Windows? Is that still a thing?), no random third party application needed. I also highly doubt their “backup every single content” claim.

It’s also a sneaky marketing article for that third party application, following the common SEO practice of giving you a half-ass solution capturing a frequent search term (in this case, “backup iPhone to Synology”), then plug their own questionable thing as the better solution.

26. lxgr ◴[] No.44500496{6}[source]
> Preparation. How to set up Synology NAS on PC

That’s a guide on how to backup an iPhone to a NAS using a computer.

Unsurprisingly, a reasonably capable general-purpose OS supports network file systems in a way transparent to applications, but that doesn’t help people using only an iOS device.

27. lxgr ◴[] No.44500513{6}[source]
> How many people own a NAS, but not a PC or Mac?

That’s arguably the wrong question: I bet a lot more would own one if they could easily backup their iOS devices to it.

replies(2): >>44500705 #>>44500758 #
28. hnaccount_rng ◴[] No.44500645[source]
But if the limiting factor is the data on which the models are trained and not the actual “computation” than this would be exactly expected right?
replies(1): >>44500914 #
29. hnaccount_rng ◴[] No.44500705{7}[source]
The number of people that would but a NAS over just spending the 5$/month for storage is well below a percent and if you combine that with the requirement of not having a PC/Mac you may well end up in the hundreds…

There aren’t that many people that are willing to own a device from a company but not trusting that company with their data

replies(1): >>44501174 #
30. handfuloflight ◴[] No.44500751[source]
Comparable to which bigger models?
replies(1): >>44510946 #
31. GeekyBear ◴[] No.44500758{7}[source]
I'm willing to bet that more people would backup their Android device if Google provided a first party tool for user friendly backups of Android devices to local computers.
32. handfuloflight ◴[] No.44500765{3}[source]
laughs nervously under a waterfall
33. iwontberude ◴[] No.44500896{3}[source]
I think it’s just a convenient stepping stone more than a long term strategy.
34. Ldorigo ◴[] No.44500914{3}[source]
The data might be the limiting factor of current transformer architectures, but there's no reason to believe it's a general limiting factor of any language model (e.g. humans brains are "trained" on orders of magnitude less data and still generally perform better than any model available today)
replies(1): >>44501486 #
35. hardwaresofton ◴[] No.44501028[source]
Would you mind sharing how you arrived at this conclusion? Was there some benchmark that it really shined at? Personal use?
replies(1): >>44510937 #
36. roughly ◴[] No.44501060[source]
> A diffusion model comes with a lot of benefits in terms of parallelization and therefore speed; to my mind the architecture is a better fit for coding than strict left to right generation.

I had a similar notion and am excited to see this research being done. My experience of writing code is that the structure of the whole system influences each individual part, which has always felt like a better match for a diffusion type model.

I’m suspecting this is a 7B model because it’s an experiment, but I do like seeing Apple playing with smaller models - I think Google’s “no moat” memo is still fundamentally correct, either via better architectures or Moore’s law, and it seems like Apple thinks the same.

replies(1): >>44505434 #
37. lokar ◴[] No.44501077{3}[source]
That’s true at the project level. But surely when you sit down to actually work for a couple hours you think about what you are going to do, and then mostly do that.
replies(1): >>44501570 #
38. overfeed ◴[] No.44501082[source]
> I think Apple will ultimately destroy the data center

I think EVs destroying Ultra Large Container ships had better odds, amd both are extremely unlikely. Dc advantages Apple won't be able to overcome: compute density, cooling, cheap power, physical security to protect the software, scale + bandwidth, lower costs to customers of using contract manufacturers and/or commodity hardware.

There is no universe where large enterprises ditch their geo-located racks. Let alone hyperscalers, especially now that they are scrounging for energy, reneging on pledges on renewables, and paying bug bucks to bring nuclear power stations online

replies(1): >>44513319 #
39. lxgr ◴[] No.44501174{8}[source]
Your numbers might be right, but Apple has implemented niche features, some even requiring expensive per-device hardware, for much less than that.
replies(1): >>44501479 #
40. hnaccount_rng ◴[] No.44501479{9}[source]
Do you have an example?
replies(1): >>44502101 #
41. hnaccount_rng ◴[] No.44501486{4}[source]
That depends on whether these current learning models can really generalise or whether they can only interpolate within their training set
42. layer8 ◴[] No.44501570{4}[source]
In my experience it’s more fractal. Any subgoal, however small, may run into its own planning/thinking and then doing sequence, or even have you reconsider the higher-level plan. Of course, it somewhat depends on how run-of-the-mill the overall task is.
43. lxgr ◴[] No.44502101{10}[source]
All new iPhone models support native DisplayPort output via USB-C, yet I’m not sure 1% of users even have the required cable/adapter.

Some of the power amplifiers for rarely-used bands probably qualify as well (mmWave in particular).

On the software side I’d have to dig a bit, but I bet many code paths on iOS see use of less than 1% of all users.

44. Eggpants ◴[] No.44503219[source]
Most of the "magic" of large models are really just function calls, so as long as the small models have access to the same functions they work well. They fixed the "how many R's in Strawberry" issue by offloading the question to a function, not spending a godly amount of money/energy on training another model.

Oops, sorry "Tools". Gotta maintain the grift these statistic based lossy text compression cool bar tricks are "thinking".

45. sitkack ◴[] No.44505434[source]
The "no moat" memo is way more complex than Google admitting an uncomfortable truth. The benefit massively from having seemingly internal documents leaked about how the play field is fair.
46. miroljub ◴[] No.44510937{3}[source]
Personal use, no benchmark, just a vibe.
47. miroljub ◴[] No.44510946{3}[source]
My previous favourite was qwen2.5-coder.
48. iwontberude ◴[] No.44513319{3}[source]
It’s easy to imagine a universe where the hyperscalers are in a bubble and they will eventually find a limit to adding classical compute and we will hit peak datacenter and shrink from there.