All I can say is
my experience is that this is the difference between
wanting something to be true, and it actually being true.
> 8B models are extremely fast and cheap to run
yes.
> Combined with good RAG they can do very well.
This is simply not true. They perform at a level which is useful for simple, trivial tasks.
If you consider that 'doing well', then sure.
However, if, like the parent post, you want to be writing scripts, which is specifically what they asked... then: heck, what 8B are you using, because llama 3.1 is shit at it out of the box.
¯\_(ツ)_/¯
A working unit test can take 6 or 7 iterations with a good prompt. Forget writing logic. Creating classes? Using RAG to execute functions from a spec? Forget it.
That's not not the level that I need for an assistant.