Looks very nice, saved it for later. Last week, I worked on implementing always-on speech-to-text functionality for automating tasks. I've made significant progress, achieving decent accuracy, but I imposed some self-imposed constraints to implement certain parts from scratch to deliver a single binary deployable solution, which means I still have work to do (audio processing is new territory for me). However, I'm optimistic about its potential.
That being said, I think the more straightforward approach would be to utilize an existing library like https://github.com/collabora/WhisperLive/ within a Docker container. This way, you can call it via WebSocket and integrate it with my LLM, which could also serve as a nice feature in your product.