The era of open voice assistants

1. nickthegreek ◴[20 Dec 24 04:08 UTC] No.42468203[source]▶

And on back order everywhere. I just spent the last 2 weeks getting a esp32-s3-box setup to do this but its lack of audio out really irks me.

replies(3): >>42468583 #>>42468794 #>>42469077 #

2. joshstrange ◴[20 Dec 24 05:40 UTC] No.42468583[source]▶

>>42468203 (TP) #

And the mic is not all that great either. I have a couple of them but they just weren't reliably picking up my voice and I couldn't hear the reply either (when it did hear me). I figured it would be easy to add a speaker to them but that sent me down a rabbit hole that I gave up on and put them in a drawer. I'll buy this for sure though because when the ESP32 box thing worked it worked really well and I loved being able to swap out parts of the assist pipeline.

replies(2): >>42471252 #>>42472587 #

3. yzydserd ◴[20 Dec 24 06:33 UTC] No.42468794[source]▶

>>42468203 (TP) #

> And on back order everywhere.

I just clicked through to my large country and the first vendor and was able to buy 2 for delivery tomorrow. So it says. So maybe not on back order everywhere.

4. sofixa ◴[20 Dec 24 07:43 UTC] No.42469077[source]▶

>>42468203 (TP) #

If it's an ESP32-S3-BOX-3, there is audio out (assuming you mean being able to send arbitrary audio to it to play). Due to the framework used it's not available, but there's an alternative firmware available on GitHub that uses the newer framework and it exposes a media player entity you can send any audio to.

replies(1): >>42471265 #

5. nickthegreek ◴[20 Dec 24 14:18 UTC] No.42471252[source]▶

>>42468583 #

I ended up moddng the s3 yaml to turn off the internal speaker and to forward all voice responses to a google hub.

6. nickthegreek ◴[20 Dec 24 14:20 UTC] No.42471265[source]▶

>>42469077 #

I didn’t have the -3 version. Learned the hard way after loading up that alt framework last week and the screen went blank I did end up implementing that same solution on my hardware though.

7. alias_neo ◴[20 Dec 24 16:44 UTC] No.42472587[source]▶

>>42468583 #

To be fair, the issue with the Box-3 is HA's implementation; I used it with heywillow.io and it was incredible, I could speak to it from another room and it would pick up perfectly.

The audio out is terrible so I wrote a shim-server that captures the request to the TTS server for heywillow and sent it to a speaker I build myself running MPD on a Pi with a nice DAC and have it play the responses instead of the box-3's tiny speaker.

I don't expect the audio-out on this to be much better with its tiny speaker, but at least it has a 3.5mm jack.

I'm going to look into what that Grove port can do too and perhaps build a new speaker "module" that the Voice PE can sit on top of to make it a proper music device.