This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly.
I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic.
This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency.
IMO this is only exacerbated by how little children (who are the presumably the target audience for stuffed animals that talk) often don't follow "normal" patterns of conversation or topics, so it feels like it'd be hard to accurately simulate/test ways in which unexpected & undesirable responses could come out.
Essentially, telling kids the truth before they're ready and without typical parental censorship? Or is there some other fear, like the AI will get compromised by a pedo and he'll talk your kid into who knows what? Or similar for "fill in state actor" using mind control on your kid (which, honestly, I feel like is normalized even for adults; eg. Fox News, etc., again US-centric)
> telling kids the truth before they're ready and without typical parental censorship
Does AI today reliably respond with "the truth"? There are countless documented incidents of even full-grown, extremely well-educated adults (e.g. lawyers) believing well-phased hallucinations. Kids, and particularly small kids who haven't yet had much education about critical thinking and what to believe, have no chance. Conversational AI today isn't an uncensured search engine into a set of well-reasoned facts, it's an algorithm constructing a response based on what it's learned people on the internet want to hear, with no real concept of what's right or wrong, or a foundational set of knowledge about the world to contrast with and validate against.
> what exactly is the fear
Being fed reliable-sounding misinformation is one. Another is being used for emotional support (which kids do even with non-talking stuffed animals), when the AI has no real concept of how to emotionally support a kid and could just as easily do the opposite. I guess overall, the concern is having a kid spend a large amount of time talking to "someone" who sounds very convincing, has no real sense of morality or truth, and can potentially distort their world view in negative ways.
And yea, there's also exposing kids to subjects they're in no way equipped to handle yet, or encouraging them to do something that would result in harm to themselves or to others. Kids are very suggestible, and it takes a long while for them to develop a real understanding of the consequences of their actions.