(github.com)

177 points akadeb | 1 comments | 22 Apr 25 14:10 UTC | HN request time: 0.001s | source

Hi HN! Last year the project I launched here got a lot of good feedback on creating speech to speech AI on the ESP32. Recently I revamped the whole stack, iterated on that feedback and made our project fully open-source—all of the client, hardware, firmware code.

This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly.

I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic.

This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency.

Show context

hakaneskici ◴[22 Apr 25 15:29 UTC] No.43763345[source]▶

>>43762409 (OP) #

Amazing, thank you for sharing. I'm interested in learning about your experience while building this :)

What kind of interesting challenges have you run into, and how have your work influenced the OpenAI's realtime API?

PS: Your github readme is quite well crafted, nowadays hard to come across.

replies(2): >>43763574 #>>43773449 #

1. akadeb ◴[23 Apr 25 15:44 UTC] No.43773449[source]▶

>>43763345 #

Thank you! It's been super fun to work on. The challenges were more on the ESP32 side. Like getting audio to work smoothly with Opus and the audio timing challenges. This is one of the reasons I open-sourced.

It seems pointless to think that everyone should cross that C++/Audio barrier to make something cool. Using this cuts a lot of dev time and brings products out to market wayy quicker. The repo basically helps launch your AI toy brand

↑

Show HN: I open-sourced my AI toy company that runs on ESP32 and OpenAI realtime