Void. I'd rather run my own models locally https://voideditor.com/
replies(1):
https://blog.steelph0enix.dev/posts/llama-cpp-guide/#quantiz...
And I get fast enough autcomplete results for it to be useful. I have and NVIDIA 4060 RTX in a laptop with 8 gigs of dedicated memory that I use for it. I still use claude for chat (pair programming) though, and I don't really use agents.
Have a 3950X w/ 32GB ram, Radeon VII & 6900XT sitting in the closet hosting smaller models then a 5800X3D/128GB/7900XTX as my main machine.
Most any quantized model that fits in half of the vram of a single gpu (and ideally supports flash attention, optionally speculative decoding) will give you far faster autocompletes. This is especially the case with the Radeon VII thanks to the memory bandwidth.