(github.com)

177 points akadeb | 2 comments | 22 Apr 25 14:10 UTC | HN request time: 0.667s | source

Hi HN! Last year the project I launched here got a lot of good feedback on creating speech to speech AI on the ESP32. Recently I revamped the whole stack, iterated on that feedback and made our project fully open-source—all of the client, hardware, firmware code.

This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly.

I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic.

This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency.

Show context

justanotheratom ◴[22 Apr 25 15:57 UTC] No.43763640[source]▶

>>43762409 (OP) #

This is quite cool. Two questions:

- why do you need nextjs frontend for what looks like a headless use case? - how much would be the OpenAI bill if there is 15 minutes of usage per day?

replies(3): >>43763849 #>>43763907 #>>43763946 #

1. irq-1 ◴[22 Apr 25 16:27 UTC] No.43763907[source]▶

>>43763640 #

> This equates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output.

https://openai.com/index/introducing-the-realtime-api/

About the nextjs site, I was thinking maybe its difficult to have supabase hold long connections, or route the response? I'm curious too.

replies(1): >>43764241 #

2. akadeb ◴[22 Apr 25 17:06 UTC] No.43764241[source]▶

>>43763907 (TP) #

The long connections are ultimately handled by Deno Edge so the site isn't used there. The NextJS frontend (which also could be an iOS/Android app) helps provide an interface to select character, create AI characters, set ESP32 volume, and view conversation history.

↑

Show HN: I open-sourced my AI toy company that runs on ESP32 and OpenAI realtime