←back to thread

899 points georgehill | 9 comments | | HN request time: 0.866s | source | bottom
Show context
world2vec ◴[] No.36216161[source]
Might be a silly question but is GGML a similar/competing library to George Hotz's tinygrad [0]?

[0] https://github.com/geohot/tinygrad

replies(2): >>36216187 #>>36218539 #
1. qeternity ◴[] No.36216187[source]
No, GGML is a CPU optimized library and quantized weight format that is closely linked to his other project llama.cpp
replies(2): >>36216244 #>>36216266 #
2. stri8ed ◴[] No.36216244[source]
How does the quantization happen? Are the weights preprocessed before loading the model?
replies(2): >>36216303 #>>36216321 #
3. ggerganov ◴[] No.36216266[source]
ggml started with focus on CPU inference, but lately we have been augmenting it with GPU support. Although still in development, it already has partial CUDA, OpenCL and Metal backend support
replies(3): >>36216327 #>>36216442 #>>36219452 #
4. sebzim4500 ◴[] No.36216303[source]
Yes, but to my knowledge it doesn't do any of the complicated optimization stuff that SOTA quantisation methods use. It basically is just doing a bunch of rounding.

There are advantages to simplicity, after all.

replies(1): >>36216416 #
5. ggerganov ◴[] No.36216321[source]
The weights are preprocessed into integer quants combined with scaling factors in various configurations (4, 5, 8-bits and recently more exotic 2, 3 and 6-bit quants). At runtime, we use efficient SIMD implementations to perform the matrix multiplication at integer level, carefully optimizing for both compute and memory bandwidth. Similar strategies are applied when running GPU inference - using custom kernels for fast Matrix x Vector multiplications
6. qeternity ◴[] No.36216327[source]
Hi Georgi - thanks for all the work, have been following and using since the availability of Llama base layers!

Wasn’t implying it’s CPU only, just that it started as a CPU optimized library.

7. brucethemoose2 ◴[] No.36216416{3}[source]
Its not so simple anymore, see https://github.com/ggerganov/llama.cpp/pull/1684
8. freedomben ◴[] No.36216442[source]
As a person burned by nvidia, I can't thank you enough for the OpenCL support
9. ignoramous ◴[] No.36219452[source]
(a novice here who knows a couple of fancy terms)

> ...lately we have been augmenting it with GPU support.

Would you say you'd then be building an equivalent to Google's JAX?

Someone even asked if anyone would build a C++ to JAX transpiler [0]... I am wondering if that's something you may implement? Thanks.

[0] https://news.ycombinator.com/item?id=35475675