←back to thread

343 points sillysaurusx | 3 comments | | HN request time: 0.618s | source
Show context
linearalgebra45 ◴[] No.35028638[source]
It's been enough time since this leaked, so my question is why aren't there blog posts already of people blowing their $300 of starter credit with ${cloud_provider} on a few hours' experimentation running inference on this 65B model?

Edit: I read the linked README.

> I was impatient and curious to try to run 65B on an 8xA100 cluster

Well?

replies(2): >>35028936 #>>35030027 #
1. ulnarkressty ◴[] No.35030027[source]
https://medium.com/@enryu9000/mini-post-first-look-at-llama-...

*later edit - not the 65G model, but the smaller ones. Performance seems mixed at first glance, not really competitive with ChatGPT fwiw.

replies(2): >>35030082 #>>35031470 #
2. linearalgebra45 ◴[] No.35030082[source]
> not the 65G model, but the smaller ones

Haha, that's right! I saw that one too

3. minxomat ◴[] No.35031470[source]
> not really competitive with ChatGPT

That's impossible to judge. LLama is a foundational model. It has received neither instructional fine tuning (davinci-3) nor RLHF (ChatGPT). It cannot be compared to these finetuned models without, well, finetuning.