I have been exploring local AI tools for coding (ollama + aider) with a small stock market simulator (~200 lines of python).
First I tried making the AI extract the dataclasses representing events to a separated file. It decided to extract some extra classes, leave behind some others, and delete parts of the code.
Then I tried to make it explain one of the actors called LongVol_player_v1, around 15 lines of code. It successfully concluded it does options delta hedging, but it jumped to the conclusion that it calculates the implied volatility. I set it as a constant, because I'm simulating specific interactions between volatility players and option dealers. It hasn't caught yet the bug where the vol player buys 3000 options but accounts only for 2000.
When asking for improvements, it is obsessed with splitting the initialization and the execution.
So far I wasted half of Saturday trying to make the machine do simple refactors. Refactors I could do myself in half of an hour.
I'm yet to see the wonders of AI.
My experience is that the hosted frontier models (o3, Gemini 2.5, Claude 4) would handle those problems with ease.
Local models that fit on a laptop are a lot less capable, sadly.
You might start to get useful results if you bump up to the 20B range - Mistral 3/3.1/3.2 Small or one of the ~20B range Gemma 3 models. Even those are way off the capabilities of the hosted frontier models though.