←back to thread

177 points akadeb | 2 comments | | HN request time: 0.412s | source

Hi HN! Last year the project I launched here got a lot of good feedback on creating speech to speech AI on the ESP32. Recently I revamped the whole stack, iterated on that feedback and made our project fully open-source—all of the client, hardware, firmware code.

This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly.

I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic.

This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency.

Show context
hakaneskici ◴[] No.43763345[source]
Amazing, thank you for sharing. I'm interested in learning about your experience while building this :)

What kind of interesting challenges have you run into, and how have your work influenced the OpenAI's realtime API?

PS: Your github readme is quite well crafted, nowadays hard to come across.

replies(2): >>43763574 #>>43773449 #
reolbox ◴[] No.43763574[source]
This is an AI reply.
replies(2): >>43763684 #>>43763705 #
hakaneskici ◴[] No.43763684[source]
What made you think that?
replies(1): >>43763709 #
johnisgood ◴[] No.43763709[source]
The README seems like what GPT would spit out, with all the emojis, diagrams, etc.

Not the first time I ran into it, but I did not bother commenting.

I can recognize it from far away. Thankfully I am not the only one.

replies(2): >>43763739 #>>43773417 #
1. akadeb ◴[] No.43773417[source]
The emojis are all AI. The content is a mix of me n cursor and I added the mermaid chart to make it easier to visualize the system diagram.

The circuit diagram in on figma

And demo video edited on capcut

replies(1): >>43774424 #
2. johnisgood ◴[] No.43774424[source]
It is fine. I use LLMs to generate stuff, too, and it wouldn't have the right content without me, similarly to yours.

Thanks for elaborating!