Most active commenters

linearalgebra45(4)

Show HN: Llama-dl – high-speed download of LLaMA, Facebook's 65B GPT model

(github.com)

Show context

linearalgebra45 ◴[05 Mar 23 11:25 UTC] No.35028638[source]▶

It's been enough time since this leaked, so my question is why aren't there blog posts already of people blowing their $300 of starter credit with ${cloud_provider} on a few hours' experimentation running inference on this 65B model?

Edit: I read the linked README.

> I was impatient and curious to try to run 65B on an 8xA100 cluster

Well?

replies(2): >>35028936 #>>35030027 #

v64 ◴[05 Mar 23 12:15 UTC] No.35028936[source]▶

>>35028638 #

The compute necessary to run 65B naively was only available on AWS (and perhaps Azure, I don't work with them) and the required instance types have been unavailable to the public recently (it seems everyone had the same idea to hop on this and try to run it). In my other post here [1], the memory requirements have been lowered through other work, and it should now be possible to run the 65B on a provider like CoreWeave.

[1] https://news.ycombinator.com/item?id=35028738

replies(2): >>35029106 #>>35029766 #

1. linearalgebra45 ◴[05 Mar 23 12:40 UTC] No.35029106[source]▶

>>35028936 #

Are you sure about that? I can't remember where I saw the table of memory requirements, but I'm sure some of the larger instances here [1] will surely be able to cope (assuming they're available!)

Oracle gives you a $300 free trial, which equates to running BM.GPU4.8 for over 10 hours - enough for a focused day of prompting

[1] https://www.oracle.com/cloud/compute/gpu/

replies(3): >>35029110 #>>35030261 #>>35034167 #

2. v64 ◴[05 Mar 23 12:41 UTC] No.35029110[source]▶

>>35029106 (TP) #

> Are you sure about that?

I'm not. The only way to know it is to try :) thank you for the link!

replies(1): >>35029159 #

3. linearalgebra45 ◴[05 Mar 23 12:49 UTC] No.35029159[source]▶

>>35029110 #

You only get a single month-long window to spend the credit! And I'm sure not going to spend any of my own money on prompting experiments.

I might be suffering from FOMO to some degree, I've just got to tell myself that this won't have been the only time model weights get leaked!

replies(1): >>35030979 #

4. smoldesu ◴[05 Mar 23 15:14 UTC] No.35030261[source]▶

>>35029106 (TP) #

Thanks for sharing it! I'm using their "Always Free" tier to host an Ampere-accelerated GPT-J chatbot right now. Works like a charm, and best of all, it's free!

replies(2): >>35030667 #>>35031376 #

5. jocaal ◴[05 Mar 23 15:53 UTC] No.35030667[source]▶

>>35030261 #

I don't understand, the Ampere they refer to in their free tier are cpu's not gpu's. How did you manage to do that

replies(1): >>35030865 #

6. smoldesu ◴[05 Mar 23 16:10 UTC] No.35030865{3}[source]▶

>>35030667 #

Custom PyTorch with on-chip acceleration: https://cloudmarketplace.oracle.com/marketplace/en_US/listin...

Not as fast as a GPU, but less than 5 seconds for a 250 token response is good enough for a Discord bot.

replies(1): >>35034200 #

7. mynameisvlad ◴[05 Mar 23 16:20 UTC] No.35030979{3}[source]▶

>>35029159 #

> And I'm sure not going to spend any of my own money on prompting experiments.

This certainly sounds a lot like whining that others aren’t doing the work you yourself don’t want to do.

replies(1): >>35031273 #

8. linearalgebra45 ◴[05 Mar 23 16:47 UTC] No.35031273{4}[source]▶

>>35030979 #

"prompting experiments" is just my use-case. According to v64 a lot of people have had the same idea of spinning up a trial instance to run inference, which is unsurprising.

I'm not in a position to put in any meaningful work towards optimising this model for lower-end hardware, or working on the tooling/documentation/user experience.

9. damascus ◴[05 Mar 23 16:55 UTC] No.35031376[source]▶

>>35030261 #

Do you have any code from your discord bot you're willing to share? I'd be happy to share back any updates I made to it. I've been wanting to play with this idea for a bit.

replies(1): >>35032653 #

10. ◴[05 Mar 23 18:50 UTC] No.35032653{3}[source]▶

>>35031376 #

11. fswd ◴[05 Mar 23 21:30 UTC] No.35034167[source]▶

>>35029106 (TP) #

If you actually try and do this, the sales people will stop you due to some internal rule. No GPUs on free credit. Unless the situation has changed of course..

12. nl ◴[05 Mar 23 21:33 UTC] No.35034200{4}[source]▶

>>35030865 #

This is the most interesting thing I've read in this thread. How have I never heard of this accelerator?!

↑