Show HN: Llama-dl – high-speed download of LLaMA, Facebook's 65B GPT model

(github.com)

343 points sillysaurusx | 1 comments | 05 Mar 23 04:28 UTC | HN request time: 0.199s | source

Show context

v64 ◴[05 Mar 23 11:44 UTC] No.35028738[source]▶

If anyone is interested in running this at home, please follow the llama-int8 project [1]. LLM.int8() is a recent development allowing LLMs to run in half the memory without loss of performance [2]. Note that at the end of [2]'s abstract, the authors state "This result makes such models much more accessible, for example making it possible to use OPT-175B/BLOOM on a single server with consumer GPUs. We open-source our software." I'm very thankful we have researchers like this further democratizing access to this data and prying it out of the hands of the gatekeepers who wish to monetize it.

[1] https://github.com/tloen/llama-int8

[2] https://arxiv.org/abs/2208.07339

replies(5): >>35028950 #>>35029068 #>>35029601 #>>35030214 #>>35030868 #

swyx ◴[05 Mar 23 13:54 UTC] No.35029601[source]▶

>>35028738 #

why is it that these models tend to be released as float16 and converting to int8 is left to the reader? is there something special about training that defaults you to float16?

replies(3): >>35030106 #>>35030321 #>>35033050 #

dspillett ◴[05 Mar 23 15:20 UTC] No.35030321[source]▶

>>35029601 #

Precision, aiming those names refer to standard binary numeric types. IEEE754 16-bit floats carry 11 significant digits with absolute precision so by coverting to 8-bit integers you lose some of that. Depending on the distribution of the values in those floats you could be loosing a lot more detail then this would imply, which is the reason we use floating point numbers for anything in the first place (rather than using an int16 where you have greater precision at you maximum scale but much less at lower scales).

So if the model is computed using float16s, distribute as-is and let the end user choose to user it like that or compromise for faster processing of there system can deal with many billions of int8s more effectively.

replies(1): >>35039562 #

1. dspillett ◴[06 Mar 23 10:56 UTC] No.35039562[source]▶

>>35030321 #

("aiming" should have been "assuming" in that second word – noticed far too late to correct, I really should stop using my phone's slide keyboard, either it or I or both are getting far less reliable)

↑