Jetbrains has 100MB models of languages for their IDEs that can auto complete single lines. It's good but I think we can do better for local code auto complete. I hope Apple succeeds in their on device AI attempts.
You can run Qwen3 locally today if you want to. It can write whole files if you want (although not with <1 second latency like a sub 1GB model will which is what you want for interactive in-editor completions.)