The era of open voice assistants

1. jfim ◴[20 Dec 24 03:33 UTC] No.42468047[source]▶

That's a pretty timely release considering Alexa and the Google assistant devices seem to have plateaued or are on the decline.

replies(1): >>42468486 #

2. IgorPartola ◴[20 Dec 24 05:18 UTC] No.42468486[source]▶

>>42468047 (TP) #

Curious what you mean by that.

replies(5): >>42468541 #>>42469172 #>>42470796 #>>42472284 #>>42474211 #

3. oaththrowaway ◴[20 Dec 24 05:32 UTC] No.42468541[source]▶

>>42468486 #

For me the Alexa devices I own have gotten worse. Can't do simple things (setting a timer used to be instant, now it takes 10-15 seconds of thinking assuming it heard properly), playing music is a joke (will try to play through Deezer even though I disaled that integration months ago, and then will default to Amazon Music instead of Spotify which is set as the default).

And then even simple skills can't understand what I'm asking 60% of the time. The first maybe 2 years after launch it seemed like everything worked pretty good but since then it's been a frustrating decline.

Currently they are relagated to timers and music, and it can't even manage those half the time anymore.

replies(4): >>42468954 #>>42469393 #>>42470886 #>>42471275 #

4. interludead ◴[20 Dec 24 07:13 UTC] No.42468954{3}[source]▶

>>42468541 #

That aligns with some of the frustration I’ve heard from others. It’s surprising (and disappointing) how these platforms, which seemed to have so much potential early on, have started to feel more like a liability

5. bdavbdav ◴[20 Dec 24 08:06 UTC] No.42469172[source]▶

>>42468486 #

GH is basically abandonware at this stage it seems. They just seem to break random things, and there haven’t been any major updates / features for ages (and Gemini is still a way off for most).

replies(1): >>42471285 #

6. lelag ◴[20 Dec 24 09:00 UTC] No.42469393{3}[source]▶

>>42468541 #

It is, I think, a common feeling among Echo/Alexa users. Now that people are getting used to the amazing understanding capabilities of ChatGPT and the likes, it probably increases the frustration level because you get a hint of how good it could be.

I believe it boils down to two main issues:

- The narrow AI systems used for intent inference have not scaled with the product features.

- Amazon is stuck and can't significantly improve it using general AI due to costs.

The first point is that the speech-to-intent algorithms currently in production are quite basic, likely based on the state of the art from 2013. Initially, there were few features available, so the device was fairly effective at inferring what you wanted from a limited set of possibilities. Over time, Amazon introduced more and more features to choose from, but the devices didn't get any smarter. As a result, mismatches between actual intent and inferred intent became more common, giving the impression that the device is getting dumber. In truth, it’s probably getting somewhat smarter, but not enough to compensate for the increasing complexity over time.

The second point is that, clearly, it would be relatively straightforward to create a much smarter Alexa: simply delegate the intent detection to an LLM. However, Amazon can’t do that. By 2019, there were already over 100 million Alexa devices in circulation, and it’s reasonable to assume that number has at least doubled by now. These devices are likely sold at a low margin, and the service is free. If you start requiring GPUs to process millions of daily requests, you would need an enormous, costly infrastructure, which is probably impossible to justify financially—and perhaps even infeasible given the sheer scale of the product.

My prediction is that Amazon cannot save the product, and it will die a slow death. It will probably keep working for years but will likely be relegated by most users to a "dumb" device capable of little more than setting alarms, timers, and providing weather reports.

If you want Jarvis-like intelligence to control your home automation system, the vision of a local assistant using local AI on an efficient GPU, as presented by HA, is the one with the most chance of succeeding. Beyond the privacy benefits of processing everything locally, the primary reason this approach may become common is that it scales linearly with the installation.

If you had a cloud-based solution using Echo-like devices, the problem is that you’d need to scale your cloud infrastructure as you sell more devices. If the service is good, this could become a major challenge. In contrast, if you sell an expensive box with an integrated GPU that does everything locally, you deploy the infrastructure as you sell the product. This eliminates scaling issues and the risks of growing too fast.

replies(3): >>42469542 #>>42470084 #>>42471009 #

7. freedomben ◴[20 Dec 24 09:38 UTC] No.42469542{4}[source]▶

>>42469393 #

It seems ridiculous to me that this comment is so down voted. It's a thoughtful and interesting comment, and contains a reasonable and even likely explanation for what we've seen, once one puts aside the bottom that Amazon is just evil, which isn't a useful way to think of you truly want to understand the world and motivations.

I'm guessing people reflexively down vote because they hate Amazon and it could read like a defense. I hate Amazon too, but emotional voting is unbecoming of HN. If you want emotional voting reddit is available and enormous.

replies(1): >>42469840 #

8. imiric ◴[20 Dec 24 10:28 UTC] No.42469840{5}[source]▶

>>42469542 #

I didn't downvote it, but claiming that Echo/Alexa are behind because of financial reasons is misguided at best.

Amazon is one of the richest companies on the planet, with vast datacenters that power large parts of the internet. If they wanted to improve their AI products they certainly have the resources to do so.

replies(3): >>42470252 #>>42470407 #>>42473008 #

9. stavros ◴[20 Dec 24 11:05 UTC] No.42470084{4}[source]▶

>>42469393 #

I think the economics here are wrong by orders of magnitude. It doesn't make sense to deploy to the home an expensive GPU that will sit idle 99% of the time, unless running an LLM gets much cheaper, computationally. It's much cheaper to run it on-premise and charge a subscription, otherwise nobody would pay for ChatGPT and would have an LLM rig at home instead.

replies(1): >>42470607 #

10. thanksgiving ◴[20 Dec 24 11:32 UTC] No.42470252{6}[source]▶

>>42469840 #

How do you justify to your manager to spend (and more importantly commit to spending for a long time) hundreds of millions of dollars in aws resources every year? Sure, you already have the hardware but that's a different org, right? You can't expect them to give you those resources for free. Also, voice needs to be instant. You can't say 'Well, the AWS instances are currently expensive. Try again when my spot prices are lower."

I am sure you know this but maybe some don't know that basically only the hot word detection is on device. It needs to be connected to the Internet for basically everything else. It already costs Amazon.com some money to run this infrastructure. What we are asking will cost more and you can't really charge the users more. I personally would definitely not sign up for a paid subscription to use Amazon Alexa.

11. baq ◴[20 Dec 24 12:05 UTC] No.42470407{6}[source]▶

>>42469840 #

Alexa is probably a cool billion under or something. They never figured out how to make money with it.

12. lelag ◴[20 Dec 24 12:42 UTC] No.42470607{5}[source]▶

>>42470084 #

You are right, but that's not my point. The point is that it's difficult to scale in the cloud products that requires lots of AI workloads.

Here, home assistant is telling you: you can use your own infra (most people won't) or you can use our cloud.

It works because most likely the user base will be rather small and home assistant can get cloud resources as if it was infinite on that scale.

If their product was amazing, and suddenly millions of people wanted to buy the cloud version, they would have a big problem: cloud infrastructure is never infinite at scale. They would be limited by how much compute their cloud provider is able/willing to sell them, rather than how much of that small boxes they could sell, possibly loosing the opportunity to corner the market with a great product.

If you package everything, you don't have that problem (you only have the one to be able to make the product, which I agree is also not small). But in term of energy efficiency, it also does not have to be that bad: the apple silicon line has shown that you can have very efficient hardware with significant AI capabilities, if you design a SOC for that purpose, it can be energy efficient.

Maybe I'm wrong that the approach will get common, but the fact that scaling AI services to millions of users is hard stand.

replies(1): >>42470809 #

13. lolinder ◴[20 Dec 24 13:10 UTC] No.42470796[source]▶

>>42468486 #

On the Google side it's become basically useless for anything beyond interacting with local devices and setting timers and reminders (in other words, the things that FOSS should be able to do very easily). Its only edge over other options used to be answering questions quickly without having to pull out a screen, but now it refuses to answer anything (likely because Google Search has removed their old quick answers in favor of Gemini answers).

14. stavros ◴[20 Dec 24 13:12 UTC] No.42470809{6}[source]▶

>>42470607 #

But here you're assuming that your datacenter can't provide you with X GPUs, but you can manufacture 100X, which is dictated by 1% utilization.

15. IgorPartola ◴[20 Dec 24 13:25 UTC] No.42470886{3}[source]▶

>>42468541 #

That’s interesting because I have a bunch of Echos of various types in my house and my timers and answers are instant. Is it possible your internet connection is wonky or you have a slow DNS server or congested Wi-Fi? I don’t have the absolute newest devices but the one in my bedroom is the very original Echo that I got during their preview stage, the one in my kitchen is the Echo Show 7” and I have a bunch of puck ones and spherical ones (don’t remember the generations) around the house. One did die at one point after years of use and got replaced but it was in my kids room so I suspect it was subject to some abuse.

replies(1): >>42473079 #

16. IgorPartola ◴[20 Dec 24 13:46 UTC] No.42471009{4}[source]▶

>>42469393 #

This is very well thought out but I think your premise is a bit wrong. I have about a dozen Echos of various generations in my house. The oldest one is the very original from the preview stage. They still do everything I want them to and my entire family still uses them daily with zero frustration.

Local GPU doesn’t make sense for some of the same reasons you list. First, hardware requirements are changing rapidly. Why would I spend say $500 on a local GPU setup when in two years the LLM running on it will slow to a crawl due to limited resources? Probably would make more sense to rent a GPU on the cloud and upgrade as new generations come out.

Amazon has the opposite situation: their hardware and infra is upgraded en masse so different economies. Also while your GPU is idling at 20-30W while you aren’t home they can have 100% utilization of their resources because their GPUs are not limited to one customer at a time. Plus they can always offload the processing by contracting OpenAI or similar. Google is in an even better position to do this. Running a local LLM today doesn’t make a lot of sense, but it probably will at some point in like 10 years. I base this on the fact that the requirements for a device like a voice assistant are limited so at some point the hardware and software will catch up. We saw this with smartphones: you can now go 5 years without upgrading and things still work fine. But that wasn’t the case 10 years ago.

Second, Amazon definitely goofed. They thought people would use the Echos for shopping. They didn’t. Literally the only uses for them are alarms and timers, controlling lights and other smart home devices, and answering trivia questions. That’s it. What other requirements do you have that don’t fall in this category? And the Echos do this stuff incredibly well. They can do complex variations too, including turning off the lights after a timer goes off, scheduling lights, etc. Amazon is basically giving these devices away but the way to pivot this is to release a line of smart devices that connect to the Echos: smart bulbs and switches, smart locks, etc. They do have TVs which you can control with an Echo fairly well (and it is getting better). An ecosystem of smart devices that seamlessly interoperate will dwarf what HA has to offer (and I say this as someone who is firmly on HA’s side). And this is Amazon’s core competency: consumer devices and sales.

If your requirement is that you want Jarvis, it’s not the voice device part of it that you want. You want what it is connected to: a self driving car you can summon, DoorDash you can order by saying “I want a pizza”, a phone line so it can call your insurance company and dispute a claim on your behalf.

Now the last piece here is privacy and it’s a doozy. The only way to solve this for Amazon is to figure out some form of encrypted computation that allows for your voice prompts to be processed without them ever hearing clear voice versions. Mathematically possible, practically not so much. But clearly consumers don’t give a fuck whatsoever about it. They trust Amazon. That’s why there are hundreds of millions of these devices. So in effect while people on HN think they are the target market for these devices, they are clearly the opposite. We aren’t the thought leaders, we are the Luddites. And again I say this as someone who wishes there was a way to avoid the privacy issue, to have more control over my own tech, etc. I run an extensive HA setup but use Echos for the voice control because at least for now they are be best value. I am excited about TFA because it means there might be a better choice soon. But even here a $59 device is going to have a hard time competing with one that routinely go on sale for $19.

17. mrweasel ◴[20 Dec 24 14:21 UTC] No.42471275{3}[source]▶

>>42468541 #

Amazon also fired a large number of people from the Alexa team last year. I don't really think Alexa is a major priority for Amazon at this point.

I don't blame them, sure there are millions of devices out there, but some people might own five device. So there aren't as many users as there are devices and they aren't making them any money once bought, not like the Kindle.

Frankly I know shockingly few people who uses Siri/Alexa/Google Assistant/Bixby. It's not that voice assistants don't have a use, be it is a much much small use case than initially envisioned and there's no longer the money to found the development, the funds went into blockchain and LLMs. Partly the decline is because it's not as natural an interface as we expected, secondly: to be actually useful, the assistants need access to control things that we may not be comfortable with, or which may pose a liability to the manufacturers.

18. cachvico ◴[20 Dec 24 14:22 UTC] No.42471285{3}[source]▶

>>42469172 #

Google Home's Nest integration is recent and top-notch though.

Hopefully in a year they'll have rolled out the Gemini integration and things will be back on track.

replies(1): >>42504984 #

19. stickfigure ◴[20 Dec 24 16:09 UTC] No.42472284[source]▶

>>42468486 #

I was an early adopter of google home, have had several generations (including the latest). I quite like the devices, but the voice recognition seems to be getting worse not better. And the Pandora integration crashes frequently.

In addition, it's a moron. I'm not sure it's actually gotten dumber, but in the age of chatgpt, asking google assistant for information is worse than asking my 2nd grader. Maybe it will be able to quote part of a relevant web page, but half the time it screws that up. I just want it to convert my voice to text, submit it to chatgpt or claude, and read the response back to me.

All that said, the audio quality is good and it shows pictures of my kid when idle. If they suddenly disappeared I would replace them.

20. gorbachev ◴[20 Dec 24 17:30 UTC] No.42473008{6}[source]▶

>>42469840 #

Even the richest company in the world doesn't run unprofitable projects forever.

Just see Killed by Google.

replies(1): >>42473646 #

21. creeble ◴[20 Dec 24 17:38 UTC] No.42473079{4}[source]▶

>>42470886 #

I too get pretty consistent response and answers from Alexa these days. There has been some vague decline in the quality of answers (I think sometime back they removed the ability to ask for Wikipedia data), but have no trouble with timers and the few linked wemo switches I have.

I’m also the author of an Alexa skill for a music player (basic “transport” control mostly) that i use every day, and it still works the same as it always did.

Occasionally I’ll get some freakout answer or abject failure to reply, but it’s fairly rare. I did notice it was down for a whole weekend once; that’s surely related to staffing or priorities.

22. imiric ◴[20 Dec 24 18:44 UTC] No.42473646{7}[source]▶

>>42473008 #

That depends on the company. There is precedent of large companies keeping unprofitable projects alive because they can make up for it in other ways, or it's good for marketing, etc. I.e. the razor and blades business model.

Perhaps Echo/Alexa entice users to become Prime members, and they're not meant to be market leaders. We can only speculate as outsiders.

My point is that claiming that a product of one the richest companies on Earth is not as subjectively good as the competition because of financial reasons is far-fetched.

replies(1): >>42478573 #

23. throwawayq3423 ◴[20 Dec 24 19:38 UTC] No.42474211[source]▶

>>42468486 #

Google and Amazon refuse to put GenAI into their existing speakers (which barely function). No doubt they want a new product launch to charge more.

24. djtango ◴[21 Dec 24 09:37 UTC] No.42478573{8}[source]▶

>>42473646 #

Just because they're rich doesn't mean that they can or will fund features like this if they can't justify the business case for it.

Amazon is a business and frugality is/was a core tenet. Just because they can put Alexa in front of LLMs and use GPU hours to power it doesn't mean that is the best reinvestment of their profits.

The idea of using LLMs for Alexa is so painfully obvious that people all the way from L3 to S Team will have considered it, and Amazon are already doing interesting R&D with genAI so we should assume that it isn't corporate inertia or malaise for why they haven't. The most feasible explanation from the outside is that it is not commercially viable especially "free" versus a subscription model. At least with Apple (and Siri is still painfully lacking) you are paying for it being locked into the Apple ecosystem and paying thousands for their hardware and paying eyewatering premiums for things like storage on the iPhone

25. bdavbdav ◴[24 Dec 24 21:17 UTC] No.42504984{4}[source]▶

>>42471285 #

I’d not go as far as too notch. We’ve reverted as family members don’t get notifications (like from the doorbell)