←back to thread

The era of open voice assistants

(www.home-assistant.io)
878 points _Microft | 1 comments | | HN request time: 0s | source
Show context
jfim ◴[] No.42468047[source]
That's a pretty timely release considering Alexa and the Google assistant devices seem to have plateaued or are on the decline.
replies(1): >>42468486 #
IgorPartola ◴[] No.42468486[source]
Curious what you mean by that.
replies(5): >>42468541 #>>42469172 #>>42470796 #>>42472284 #>>42474211 #
oaththrowaway ◴[] No.42468541[source]
For me the Alexa devices I own have gotten worse. Can't do simple things (setting a timer used to be instant, now it takes 10-15 seconds of thinking assuming it heard properly), playing music is a joke (will try to play through Deezer even though I disaled that integration months ago, and then will default to Amazon Music instead of Spotify which is set as the default).

And then even simple skills can't understand what I'm asking 60% of the time. The first maybe 2 years after launch it seemed like everything worked pretty good but since then it's been a frustrating decline.

Currently they are relagated to timers and music, and it can't even manage those half the time anymore.

replies(4): >>42468954 #>>42469393 #>>42470886 #>>42471275 #
lelag ◴[] No.42469393[source]
It is, I think, a common feeling among Echo/Alexa users. Now that people are getting used to the amazing understanding capabilities of ChatGPT and the likes, it probably increases the frustration level because you get a hint of how good it could be.

I believe it boils down to two main issues:

- The narrow AI systems used for intent inference have not scaled with the product features.

- Amazon is stuck and can't significantly improve it using general AI due to costs.

The first point is that the speech-to-intent algorithms currently in production are quite basic, likely based on the state of the art from 2013. Initially, there were few features available, so the device was fairly effective at inferring what you wanted from a limited set of possibilities. Over time, Amazon introduced more and more features to choose from, but the devices didn't get any smarter. As a result, mismatches between actual intent and inferred intent became more common, giving the impression that the device is getting dumber. In truth, it’s probably getting somewhat smarter, but not enough to compensate for the increasing complexity over time.

The second point is that, clearly, it would be relatively straightforward to create a much smarter Alexa: simply delegate the intent detection to an LLM. However, Amazon can’t do that. By 2019, there were already over 100 million Alexa devices in circulation, and it’s reasonable to assume that number has at least doubled by now. These devices are likely sold at a low margin, and the service is free. If you start requiring GPUs to process millions of daily requests, you would need an enormous, costly infrastructure, which is probably impossible to justify financially—and perhaps even infeasible given the sheer scale of the product.

My prediction is that Amazon cannot save the product, and it will die a slow death. It will probably keep working for years but will likely be relegated by most users to a "dumb" device capable of little more than setting alarms, timers, and providing weather reports.

If you want Jarvis-like intelligence to control your home automation system, the vision of a local assistant using local AI on an efficient GPU, as presented by HA, is the one with the most chance of succeeding. Beyond the privacy benefits of processing everything locally, the primary reason this approach may become common is that it scales linearly with the installation.

If you had a cloud-based solution using Echo-like devices, the problem is that you’d need to scale your cloud infrastructure as you sell more devices. If the service is good, this could become a major challenge. In contrast, if you sell an expensive box with an integrated GPU that does everything locally, you deploy the infrastructure as you sell the product. This eliminates scaling issues and the risks of growing too fast.

replies(3): >>42469542 #>>42470084 #>>42471009 #
stavros ◴[] No.42470084{3}[source]
I think the economics here are wrong by orders of magnitude. It doesn't make sense to deploy to the home an expensive GPU that will sit idle 99% of the time, unless running an LLM gets much cheaper, computationally. It's much cheaper to run it on-premise and charge a subscription, otherwise nobody would pay for ChatGPT and would have an LLM rig at home instead.
replies(1): >>42470607 #
lelag ◴[] No.42470607{4}[source]
You are right, but that's not my point. The point is that it's difficult to scale in the cloud products that requires lots of AI workloads.

Here, home assistant is telling you: you can use your own infra (most people won't) or you can use our cloud.

It works because most likely the user base will be rather small and home assistant can get cloud resources as if it was infinite on that scale.

If their product was amazing, and suddenly millions of people wanted to buy the cloud version, they would have a big problem: cloud infrastructure is never infinite at scale. They would be limited by how much compute their cloud provider is able/willing to sell them, rather than how much of that small boxes they could sell, possibly loosing the opportunity to corner the market with a great product.

If you package everything, you don't have that problem (you only have the one to be able to make the product, which I agree is also not small). But in term of energy efficiency, it also does not have to be that bad: the apple silicon line has shown that you can have very efficient hardware with significant AI capabilities, if you design a SOC for that purpose, it can be energy efficient.

Maybe I'm wrong that the approach will get common, but the fact that scaling AI services to millions of users is hard stand.

replies(1): >>42470809 #
1. stavros ◴[] No.42470809{5}[source]
But here you're assuming that your datacenter can't provide you with X GPUs, but you can manufacture 100X, which is dictated by 1% utilization.