←back to thread

177 points akadeb | 1 comments | | HN request time: 0.239s | source

Hi HN! Last year the project I launched here got a lot of good feedback on creating speech to speech AI on the ESP32. Recently I revamped the whole stack, iterated on that feedback and made our project fully open-source—all of the client, hardware, firmware code.

This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly.

I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic.

This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency.

Show context
tantalor ◴[] No.43763970[source]
I'm surprised by the overwhelming positive vibes in the comments here.

Maybe I'm alone? To me, this comes across as extremely creepy, the exact opposite of what we should desire from AI in products aimed at children.

replies(7): >>43764077 #>>43764125 #>>43764168 #>>43764189 #>>43764195 #>>43764294 #>>43772666 #
1. akadeb ◴[] No.43772666[source]
The Elato toy is currently not aimed at children. The current version has adult characters that are entertaining and fun to engage with like the Chad Brew Barkley character in the videos. I put up more such funny videos on my tiktok tiktok.com/@elatoai

However, while testing it with a friend who has a 5-year old daughter, I added a `Story mode` feature to create dynamic stories for her which she enjoys.

I think what would be even cooler is if each character in a story has unique voices (like voice of an ogre, voice of an elf etc.) which is currently unsupported in the single websocket connnection.