←back to thread

899 points georgehill | 1 comments | | HN request time: 0.408s | source
Show context
world2vec ◴[] No.36216161[source]
Might be a silly question but is GGML a similar/competing library to George Hotz's tinygrad [0]?

[0] https://github.com/geohot/tinygrad

replies(2): >>36216187 #>>36218539 #
qeternity ◴[] No.36216187[source]
No, GGML is a CPU optimized library and quantized weight format that is closely linked to his other project llama.cpp
replies(2): >>36216244 #>>36216266 #
stri8ed ◴[] No.36216244[source]
How does the quantization happen? Are the weights preprocessed before loading the model?
replies(2): >>36216303 #>>36216321 #
1. ggerganov ◴[] No.36216321[source]
The weights are preprocessed into integer quants combined with scaling factors in various configurations (4, 5, 8-bits and recently more exotic 2, 3 and 6-bit quants). At runtime, we use efficient SIMD implementations to perform the matrix multiplication at integer level, carefully optimizing for both compute and memory bandwidth. Similar strategies are applied when running GPU inference - using custom kernels for fast Matrix x Vector multiplications