←back to thread

177 points akadeb | 4 comments | | HN request time: 0.748s | source

Hi HN! Last year the project I launched here got a lot of good feedback on creating speech to speech AI on the ESP32. Recently I revamped the whole stack, iterated on that feedback and made our project fully open-source—all of the client, hardware, firmware code.

This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly.

I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic.

This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency.

1. mcdow ◴[] No.43763202[source]
Dude this is super cool! What made you decide to open source it?

I had a similar idea that I never followed through with(even down to using an ESP).

Basically you could make a Harry Potter talking painting with basically your device + an e-ink display that displays some 3D modeled character.

For others, here’s a direct link to a demo video:

https://m.youtube.com/watch?v=o1eIAwVll5I

replies(2): >>43763230 #>>43764042 #
2. Sean-Der ◴[] No.43763230[source]
I get a `Request has expired` could you upload somewhere else?
replies(1): >>43763328 #
3. mcdow ◴[] No.43763328[source]
My bad! Updated the link.
4. magixx ◴[] No.43764042[source]
I also thought about this but wanted to look into an ESP32 CAM to get vision working. For better or worse I didn't pursue the idea as I thought in the end repurposing a cell phone would be better overall.

I do wonder if the cellphone/app argument is why we didn't see that many hardware LLM API wrappers up until now. The rabbit R1 was basically just that.

I've seen more products in this space recently such as Ropet[1], LOOI[2], and others but for now it's going to be costly for companies to sell such a product at a fixed cost as I think a subscription model would be a hard sell [3] for consumers.

[1] https://www.kickstarter.com/projects/1067657324/ropet-your-n... [2] https://looirobot.com/products/looi-robot?variant=4909200762... [3] https://tech.yahoo.com/ai/articles/tragic-robot-shutdown-sho...