GGML – AI at the Edge

(ggml.ai)

899 points georgehill | 3 comments | 06 Jun 23 16:50 UTC | HN request time: 0.625s | source

Show context

Havoc ◴[06 Jun 23 17:01 UTC] No.36215833[source]▶

>>36215651 (OP) #

How common is avx on edge platforms?

replies(2): >>36216269 #>>36217034 #

1. binarymax ◴[06 Jun 23 18:24 UTC] No.36217034[source]▶

>>36215833 #

svantana is correct that PCs are edge, but if you meant "mobile", then ARM in iOS and Android typically have NEON instructions for SIMD, not AVX: https://developer.arm.com/Architectures/Neon

replies(1): >>36217403 #

2. Havoc ◴[06 Jun 23 18:53 UTC] No.36217403[source]▶

>>36217034 (TP) #

I was thinking more edge in the distributed serverless sense, but I guess for this type of use the compute part is slow not the latency so question doesn't make much sense in hindsight

replies(1): >>36218877 #

3. binarymax ◴[06 Jun 23 20:45 UTC] No.36218877[source]▶

>>36217403 #

Compute is the latency for LLMs :)

And in general, your inference code will be compiled to a CPU/Architecture target - so you can know ahead of time what instructions you'll have access to when writing your code for that target.

For example in the case of AWS Lambda, you can choose graviton2 (ARM with NEON), or x86_64 (AVX). The trick is that for some processors such as Xeon3+ there is AVX 512, and others you will top out at AVX 256. You might be able to figure out what exact instruction set your serverless target supports.

↑