The new Mali GPU's look not bad too with +20% performance while 9% more power efficient.
And SME2-enabled Armv9.3 cores for on device AI doesn’t sound bad either
The new Mali GPU's look not bad too with +20% performance while 9% more power efficient.
And SME2-enabled Armv9.3 cores for on device AI doesn’t sound bad either
Curious to see how much of this new arch will actually be adopted by Qualcomm, or whether they will diverge further with their (Nuvia-acquired) Architecture.
Either way, I hope the result is not causing fragmentation in the market (e.g. developers not making use of next-gen ARM features because Qualcomm doesn't support them)
Am I missing something...?
The last time I tried to run local LLMs via my 7900XT using LMStudio, even with 20gb of VRAM, they were borderline usable. Fast enough, but quality of the answers and generated code was complete and utter crap. Not even in the same ballpark as ClaudeCode or GPT4/5. I'd love to run some kind of supercharged commandline-completion on there, though.
Edit: I guess my question is: What exactly justifies the extra transistors that ARM here and also AMD with their "AI MAX" keep stuffing onto their chips?
Arm C1-Ultra is the successor of Cortex-X925. C1-Ultra has great improvements in single-thread performance, but Cortex-X925 had very poor performance per die area, which made it totally unsuitable for server CPUs. Arm has not said anything about the performance per area of C1-Ultra, so I assume that it continues to be poor.
Arm C1-Pro is the successor of Cortex-A725. Arm has made server versions of the Cortex-A7xx, but Amazon did not like them for Gravitons, for being too weak.
Therefore only Arm C1-Premium could have a server derivative that would become the successor of Neoverse V3 for a future Graviton.
For now, the technical manual of C1-Premium is very sparse. Only when the optimization guide for C1-Premium will be published, showing its microarchitecture, we will know whether it is a worthy replacement for Cortex-X4/Neoverse V3, which had the best performance per die area among the previous Arm CPU cores.
Maybe also worth mentioning that the rk3588 uses Cortex A76 cores, which arm announced in 2018, so this was a 4 year old design at time of release. At this pace it seems to take the better part of a decade to get an arm core out & generally usable.
I really really hope some of this video encoding work helps lay some foundation for further mainline vpus to be easier. I bought a cute small rk3566 board hoping to make a cheap low power wifi video transmitter, and of course it requires a truly prehistoric vendor provided kernel to take advantage of the vpu, alas. Scant hope for this ever improving but maybe some decade drivers won't be a scythian nightmare.
Its nice seeing a second player come to the GPU/video space at least. Imagination GPU's are in the new Pixel phone! And a bunch of various designs here & there. Maybe they can get religion & work a little harder than others have at up streaming. There were some promising early mainlineings, but I've not seeing much in kernelnewbies release logs for a while now: troubling silence.