What kind of interesting challenges have you run into, and how have your work influenced the OpenAI's realtime API?
PS: Your github readme is quite well crafted, nowadays hard to come across.
This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly.
I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic.
This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency.
What kind of interesting challenges have you run into, and how have your work influenced the OpenAI's realtime API?
PS: Your github readme is quite well crafted, nowadays hard to come across.
Not the first time I ran into it, but I did not bother commenting.
I can recognize it from far away. Thankfully I am not the only one.
Thanks for elaborating!