←back to thread

899 points georgehill | 3 comments | | HN request time: 0.625s | source
Show context
Havoc ◴[] No.36215833[source]
How common is avx on edge platforms?
replies(2): >>36216269 #>>36217034 #
1. binarymax ◴[] No.36217034[source]
svantana is correct that PCs are edge, but if you meant "mobile", then ARM in iOS and Android typically have NEON instructions for SIMD, not AVX: https://developer.arm.com/Architectures/Neon
replies(1): >>36217403 #
2. Havoc ◴[] No.36217403[source]
I was thinking more edge in the distributed serverless sense, but I guess for this type of use the compute part is slow not the latency so question doesn't make much sense in hindsight
replies(1): >>36218877 #
3. binarymax ◴[] No.36218877[source]
Compute is the latency for LLMs :)

And in general, your inference code will be compiled to a CPU/Architecture target - so you can know ahead of time what instructions you'll have access to when writing your code for that target.

For example in the case of AWS Lambda, you can choose graviton2 (ARM with NEON), or x86_64 (AVX). The trick is that for some processors such as Xeon3+ there is AVX 512, and others you will top out at AVX 256. You might be able to figure out what exact instruction set your serverless target supports.