←back to thread

The era of open voice assistants

(www.home-assistant.io)
879 points _Microft | 10 comments | | HN request time: 1.023s | source | bottom
Show context
lxe ◴[] No.42468351[source]
Here's what I'm looking for in a voice assistant:

- Full privacy: nothing goes to the "cloud"

- Non-shitty microphones and processing: i want to be able to be heard without having to yell, repeat, or correct

- No wake words: it should listen to everything, process it, and understand when it's being addressed. Since everything is private and local, this is now doable

- Conversational: it should understand when I finished talking, have ability to be interrupted, all with low latency

- Non-stupid: it's 2024, and alexa and siri and google are somehow absolutely abysmal at doing even the basics

- Complete: i don't want to use an app to get stuff configured. I want everything to be controlled via voice

replies(5): >>42468394 #>>42468471 #>>42468967 #>>42470013 #>>42471806 #
1. danparsonson ◴[] No.42468394[source]
> No wake words: it should listen to everything, process it, and understand when it's being addressed

Even humans struggle with this one - that's what names are for!

replies(2): >>42468438 #>>42481564 #
2. antonyt ◴[] No.42468438[source]
Yeah, I’m having a hard time imagining how no-wake-word could work in practice.
replies(3): >>42468837 #>>42470838 #>>42473855 #
3. fragmede ◴[] No.42468837[source]
after setting up the system, if I say "turn the ceiling lights to 20%", who else would be changing the lights?

But also, post-fix wake word would also be natural if it was recording all the time. "turn on the lights, Google", for instance

replies(2): >>42472751 #>>42476125 #
4. ethbr1 ◴[] No.42470838[source]
Like that really annoying friend who jumps in every other sentence with "Well actually..."
replies(1): >>42472167 #
5. marcosdumay ◴[] No.42472167{3}[source]
I have a coworker that set up an Alexa an year or so ago, I don't know what was the issue, but it would jump into Teams meetings after every noise in his house.
6. TheCoelacanth ◴[] No.42472751{3}[source]
Someone in a TV show that you're watching?
replies(1): >>42479383 #
7. lukifer ◴[] No.42473855[source]
This is one advantage of a system with a constrained set of commands/grammars, as opposed to the Alexa/Siri model of trying to process all arbitrary text while in active mode. It can simply ignore/discard any invocations which don't match those specific grammars (and no need to wait to confirm that the device is awake).

"Computer, turn lights to 50%" -> "turn lights to fifty percent" -> {action: "lights", value: 50}

"My new computer has a really beefy graphics card" -> "has a really beefy graphics card" -> {action: null}

replies(1): >>42475451 #
8. danparsonson ◴[] No.42476125{3}[source]
Sure, if the system is set up to only respond to very specific commands that humans would not respond to, I guess that could work. I was thinking more about the other way around, where a person might speak to someone else in the room and be overheard and acted upon - "turn on the lights!" could be a command for the computer controlling the room, or the human standing next to the Christmas tree, for example.
9. joshstrange ◴[] No.42479383{4}[source]
I’ve never had Alexa control a device via a TV show’ audio but playing back a video of me testing my home automation (“Alex, do X”) triggered my lights.

I’d love a no-wake-word world where something locally was always chewing on what you said but I’m not sure how well it would work in practice.

I think it would only take 1-2 instances of it hearing “Hey, who turned off the lights?” in a show turning off my lights for real (and scaring the crap out of me). Doctor Who isn’t particularly scary but if I was watching Silence in the Library and that line turned off my lights I’d be spoked and it would take me a hot minute to realize what happened.

10. lxe ◴[] No.42481564[source]
Wake words are different from "listen to everyhing until name is called". A wake work is needed for both privacy and technical reasons -- you can't just have alexa beaming everything it hears to amazon. So instead it uses a local lightweight "dumb" system to listen to specific words only.

That's exactly why there's massive latencies between command recognition, processing, and execution.

Imagine if it had sub-ms response to "assistant, add uuh eggs and milk to the shopping list... actually no just eggs sorry"