I had to create a custom image of llama.cpp compiled with vulkan so the LLMs can access the GPU on my MacBook Air M4 from inside the containers for inference. It's much faster, like 8-10x faster than without.
To be honest so far I've been using mostly cloud models for coding, the local models haven't been that great.
Some more details on the blog: https://markjgsmith.com/posts/2025/10/12/just-use-llamacpp