Would love voice assistant running locally but probably there are solutions out there - didn't get to do the research yet
In short you can:
1) Run a local Willow Inference Server[1]. Supports CPU or CUDA, just about the fastest implementation of Whisper out there for "real time" speech.
2) Run local command detection on device. We pull your Home Assistant entities on setup and define basic grammar for them but any English commands (up to 400) are supported. They are recognized directly on the $50 ESP BOX device and sent to Home Assistant (or openHAB, or a REST endpoint, etc) for processing.
Whether WIS or local our performance target is 500ms from end of speech to command executed.