Edit: I read the linked README.
> I was impatient and curious to try to run 65B on an 8xA100 cluster
Well?
Edit: I read the linked README.
> I was impatient and curious to try to run 65B on an 8xA100 cluster
Well?
Oracle gives you a $300 free trial, which equates to running BM.GPU4.8 for over 10 hours - enough for a focused day of prompting
I might be suffering from FOMO to some degree, I've just got to tell myself that this won't have been the only time model weights get leaked!
This certainly sounds a lot like whining that others aren’t doing the work you yourself don’t want to do.
I'm not in a position to put in any meaningful work towards optimising this model for lower-end hardware, or working on the tooling/documentation/user experience.