For context, GPT-2-small is 0.124B params (w/ 1024-token context).
For context, GPT-2-small is 0.124B params (w/ 1024-token context).
I’ve seen a number of “DIY GPT-2” tutorials that target this sweet spot. You won’t get amazing results unless you want to leave a personal computer running for a number of hours/days and you have solid data to train on locally, but fine-tuning should be in the realm of normal hobbyists patience.
For that short of a run, you'll spend more time waiting for the node to come up, downloading the dataset, and compiling the model, though.
The general guidance I've used is that to train a model, you need an amount of RAM (or VRAM) equal to 8x the number of parameters, so a 0.125B model would need 1 GB of RAM to train.
While interactive AI is all about posing, meditating on the prompt, then trying to fix the outcome, IntelliJ tab completion... shows what it will complete as you type and you Tab when you are 100% OK with the completion, which surprisingly happens 90..99% of the time for me, depending on the project.