Any halo strix laptop, I have been using the hp zbook ultra g1a with 128gb of unified memory.
Mostly with the 20B parameters models but it can load larger ones.
I find local models (gpt oss 20B) are good quick references but if you want to refactor or something like that you need a bigger model.
I’m running llama.cpp directly and using the api it offers for neovim’s avante plugin, or a cli tool like aichat, it comes with a basic web interface as well.
replies(1):