And then even simple skills can't understand what I'm asking 60% of the time. The first maybe 2 years after launch it seemed like everything worked pretty good but since then it's been a frustrating decline.
Currently they are relagated to timers and music, and it can't even manage those half the time anymore.
I believe it boils down to two main issues:
- The narrow AI systems used for intent inference have not scaled with the product features.
- Amazon is stuck and can't significantly improve it using general AI due to costs.
The first point is that the speech-to-intent algorithms currently in production are quite basic, likely based on the state of the art from 2013. Initially, there were few features available, so the device was fairly effective at inferring what you wanted from a limited set of possibilities. Over time, Amazon introduced more and more features to choose from, but the devices didn't get any smarter. As a result, mismatches between actual intent and inferred intent became more common, giving the impression that the device is getting dumber. In truth, it’s probably getting somewhat smarter, but not enough to compensate for the increasing complexity over time.
The second point is that, clearly, it would be relatively straightforward to create a much smarter Alexa: simply delegate the intent detection to an LLM. However, Amazon can’t do that. By 2019, there were already over 100 million Alexa devices in circulation, and it’s reasonable to assume that number has at least doubled by now. These devices are likely sold at a low margin, and the service is free. If you start requiring GPUs to process millions of daily requests, you would need an enormous, costly infrastructure, which is probably impossible to justify financially—and perhaps even infeasible given the sheer scale of the product.
My prediction is that Amazon cannot save the product, and it will die a slow death. It will probably keep working for years but will likely be relegated by most users to a "dumb" device capable of little more than setting alarms, timers, and providing weather reports.
If you want Jarvis-like intelligence to control your home automation system, the vision of a local assistant using local AI on an efficient GPU, as presented by HA, is the one with the most chance of succeeding. Beyond the privacy benefits of processing everything locally, the primary reason this approach may become common is that it scales linearly with the installation.
If you had a cloud-based solution using Echo-like devices, the problem is that you’d need to scale your cloud infrastructure as you sell more devices. If the service is good, this could become a major challenge. In contrast, if you sell an expensive box with an integrated GPU that does everything locally, you deploy the infrastructure as you sell the product. This eliminates scaling issues and the risks of growing too fast.
I'm guessing people reflexively down vote because they hate Amazon and it could read like a defense. I hate Amazon too, but emotional voting is unbecoming of HN. If you want emotional voting reddit is available and enormous.
Amazon is one of the richest companies on the planet, with vast datacenters that power large parts of the internet. If they wanted to improve their AI products they certainly have the resources to do so.
I am sure you know this but maybe some don't know that basically only the hot word detection is on device. It needs to be connected to the Internet for basically everything else. It already costs Amazon.com some money to run this infrastructure. What we are asking will cost more and you can't really charge the users more. I personally would definitely not sign up for a paid subscription to use Amazon Alexa.
Perhaps Echo/Alexa entice users to become Prime members, and they're not meant to be market leaders. We can only speculate as outsiders.
My point is that claiming that a product of one the richest companies on Earth is not as subjectively good as the competition because of financial reasons is far-fetched.
Amazon is a business and frugality is/was a core tenet. Just because they can put Alexa in front of LLMs and use GPU hours to power it doesn't mean that is the best reinvestment of their profits.
The idea of using LLMs for Alexa is so painfully obvious that people all the way from L3 to S Team will have considered it, and Amazon are already doing interesting R&D with genAI so we should assume that it isn't corporate inertia or malaise for why they haven't. The most feasible explanation from the outside is that it is not commercially viable especially "free" versus a subscription model. At least with Apple (and Siri is still painfully lacking) you are paying for it being locked into the Apple ecosystem and paying thousands for their hardware and paying eyewatering premiums for things like storage on the iPhone