I recently made a little tool for people interested in running local LLMs to figure out if their hardware is able to run an LLM in GPU memory.
replies(10):
I appreciate that it's a heavy site, but just being honest with you - it doesn't seem worth the time optimising this by moving to another lighter framework at this stage of the project.
Sorry!