←back to thread

177 points akadeb | 2 comments | | HN request time: 0.432s | source

Hi HN! Last year the project I launched here got a lot of good feedback on creating speech to speech AI on the ESP32. Recently I revamped the whole stack, iterated on that feedback and made our project fully open-source—all of the client, hardware, firmware code.

This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly.

I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic.

This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency.

Show context
empath75 ◴[] No.43763312[source]
When someone figures this out, it's going to be a multi billion dollar company, but the safety concerns for actually putting something like this into the hands of children are unbelievable.
replies(3): >>43763354 #>>43763562 #>>43763936 #
mithr ◴[] No.43763562[source]
This. The idea is super cool in theory! But given how these sort of things work today, having a toy that can have an independent conversation with a kid and that, despite the best intentions of the prompt writer, isn't guaranteed to stay within its "sandbox", is terrifying enough to probably not be worth the risk.

IMO this is only exacerbated by how little children (who are the presumably the target audience for stuffed animals that talk) often don't follow "normal" patterns of conversation or topics, so it feels like it'd be hard to accurately simulate/test ways in which unexpected & undesirable responses could come out.

replies(1): >>43763975 #
conductr ◴[] No.43763975[source]
I'm trying to use my imagination, but what exactly is the fear? Perhaps the AI will explain where baby's come from in graphic detail before the parent is ready to have that conversation or something similar? Or, for us in US, maybe it tells your kid they should wear a bullet proof vest to pre-K instead of bringing a stuffy for naptime?

Essentially, telling kids the truth before they're ready and without typical parental censorship? Or is there some other fear, like the AI will get compromised by a pedo and he'll talk your kid into who knows what? Or similar for "fill in state actor" using mind control on your kid (which, honestly, I feel like is normalized even for adults; eg. Fox News, etc., again US-centric)

replies(3): >>43764156 #>>43764946 #>>43765512 #
1. mithr ◴[] No.43764946[source]
I'll respond to the content, because I think there are some genuine questions amongst the condescension and jumping to conclusions.

> telling kids the truth before they're ready and without typical parental censorship

Does AI today reliably respond with "the truth"? There are countless documented incidents of even full-grown, extremely well-educated adults (e.g. lawyers) believing well-phased hallucinations. Kids, and particularly small kids who haven't yet had much education about critical thinking and what to believe, have no chance. Conversational AI today isn't an uncensured search engine into a set of well-reasoned facts, it's an algorithm constructing a response based on what it's learned people on the internet want to hear, with no real concept of what's right or wrong, or a foundational set of knowledge about the world to contrast with and validate against.

> what exactly is the fear

Being fed reliable-sounding misinformation is one. Another is being used for emotional support (which kids do even with non-talking stuffed animals), when the AI has no real concept of how to emotionally support a kid and could just as easily do the opposite. I guess overall, the concern is having a kid spend a large amount of time talking to "someone" who sounds very convincing, has no real sense of morality or truth, and can potentially distort their world view in negative ways.

And yea, there's also exposing kids to subjects they're in no way equipped to handle yet, or encouraging them to do something that would result in harm to themselves or to others. Kids are very suggestible, and it takes a long while for them to develop a real understanding of the consequences of their actions.

replies(1): >>43765991 #
2. conductr ◴[] No.43765991[source]
Bravo, this is an answer beyond the outright fearmongering that actually makes sense and I wasn't considering. I still struggle with how it's much different than social media in terms of shaping what kids believe and their perception of reality, but I do get what you're saying - that this could be next level dangerous in terms of them believing what it says without much critical thinking.