By "I don't fully understand," I mean just that. There's a lot of marketing copy, but there's a lot I'd like to understand better before plopping down $$$ for a unit. The answers might be reasonable.
Ideally, I'd be able to experiment with a headset first, and if it works well, upgrade to the $59 unit.
I'd love to just have a README, with a getting started tutorial, play, and then upgrade if it does what I want.
Again: None of this is a complaint. I assume much of this is coming once we're past preview addition, or is perhaps there and my search skills are failing me.
Same for their voice assistant. You can but their hardware and get started right away or you can place your own mics and speakers around home and it will still work. You can but your own beefy hardware and run your own LLM.
The possibilities with home assistant are endless. Thanks to this community for breaking the barriers created by big tech
I noticed recently there weren't any good open source hardware projects for voice assistants with a focus on privacy. There's another project I've been thinking about where I think the privacy aspect is Important, and figuring out a good hardware stack has been a Process. The project I want to work on isn't exactly a voice assistant, but same ultimate hardware requirements
Something I'm kinda curious about: it sounds like they're planning on a sorta batch manufacturing by resellers type of model. Which I guess is pretty standard for hardware sales. But why not do a sorta "group buy" approach? I guess there's nothing stopping it from happening in conjunction
I've had an idea floating around for a site that enables group buys for open source hardware (or 3d printed items), that also acts like or integrates with github wrt forking/remixing
I really hope we see some open-source machine -learned systems emerge.
I saw Insta360 announce their video conferencing solution today. Optics looks pretty medium, nothing wild, but Insta360 is so good at video that I expect it'll be great. But there's a huge 14 microphone array on it, and that's the hard job; figuring out how to get good audio from speakers in a variety of locations around a room. It really made me wish for more open source footing here, some promising start, be it the conference room or open living space. I've given all of 60s to look through this, and was kinda hopeful because heck yeah Home Assistant, but my initial read isn't super promising, isn't that this is starting the proper software base needed to listen well to the world.
https://petapixel.com/2024/12/17/the-insta360-connect-is-a-2...
Llama and whisper are already public so that should help innovation in this area.
I haven't been able to quite get the Llama vision models working but I suppose with new releases in future, it should work as good as Gemini in finding bounding boxes of UI elements.
My only remaining wish is that I can replace Siri with this (without needing some workaround)
It's be awesome if they open sourced that model though, or published what models they're using. But I think it unlikely to happen because home assistant is a sorta funnel to nabu casa
That said, from what I can find, it sounds like Assist can be run without the hardware, either with or without the cloud upgrade. So you could definitely use your own hardware, headset, speakers, etc. to play with Assist
With an open source and potentially local-only device, you can have your voice assistant and keep your privacy.
It’s not exactly batteries-included, and doesn’t exercise the on-device wake word detection that satellite hardware would provide, but it’s doable.
But I don’t know that the unit will be an “upgrade” over most headsets. These devices are designed to be cheap, low-power, and have to function in tougher scenarios than speaking directly into a boom mic.
- Full privacy: nothing goes to the "cloud"
- Non-shitty microphones and processing: i want to be able to be heard without having to yell, repeat, or correct
- No wake words: it should listen to everything, process it, and understand when it's being addressed. Since everything is private and local, this is now doable
- Conversational: it should understand when I finished talking, have ability to be interrupted, all with low latency
- Non-stupid: it's 2024, and alexa and siri and google are somehow absolutely abysmal at doing even the basics
- Complete: i don't want to use an app to get stuff configured. I want everything to be controlled via voice
Is this a fully-private, open source alternative to Alexa, that by definition requires a CPU locally to run ?
Is the device supposed to be the nerve center of IoT devices ?
Can it access the Wifi to do web crawls on command (music, google, etc)?
The first thought I had when encountering LLM was that it can finally make these devices understand you and make them finally useful... and I don't need to know some presceipted keywords.
Even humans struggle with this one - that's what names are for!
"OK, Google, turn lights on" "Check your connection and try again"
As far as I can tell, if you have Home Assistant + this new device, you've fixed that problem.
This device provides the microphone, speaker, and WiFi to do wake-word detection, capture your input, send it off to your HA instance, and reply to you with HA’s processed response. Whether your HA instance phones out to the internet to produce the response is up to you and how you’ve configured it.
But for a first time unknown product? You get a lot fewer interested parties. Lots of people want to wait for tech reviews and blog posts before committing to it. And group buys being the only way to get them means availability will be inconsistent for the foreseeable future. I don’t want one voice assistant. I want 5-20, one for every space in my house. But I am not prepared to commit to 20 devices of a first run and I am not prepared to buy one and hope I’ll get the opportunity to buy more later if it doesn’t flop. Stability of the supply chain is an important signal to consumers that the device won’t be abandoned.
As long as this thing works and there's demand for it, I doubt we'll ever run out of people willing to connect an XU316 and some mics to an ESP32-S3 and sell it to you with HA's open source firmware flashed to it, whether or not HA themselves are still willing to.
I cannot wait to buy 5 or more of these to replace Alexa. HA is the brain of my house and up till now Alexa provided the best hardware to interact with HA (IMHO) but I'd love something first-party.
And then even simple skills can't understand what I'm asking 60% of the time. The first maybe 2 years after launch it seemed like everything worked pretty good but since then it's been a frustrating decline.
Currently they are relagated to timers and music, and it can't even manage those half the time anymore.
There is. I've used HA with their default assist pipeline (Cloud HA STT, Cloud HA LLM, Cloud HA TTS) and I've also plugged in different providers at each step (both remote and local for each part: STT/LLM/TTS) and it's super cool. Their default LLM isn't great but it works, plugging in OpenAI made it work way better. My local models weren't great in speed but I don't have hardware dedicated for this purpose (currently), seeing an entire local pipeline was amazing for the promise of it in the future. It's too slow (on my hardware) but we are so close to local models (SST/TTS could be improved as well but they are much easier to do already locally).
If this new HA hardware comes even close to performing as well as the Echo's in my house (low bar) I'll replace them all.
But also, post-fix wake word would also be natural if it was recording all the time. "turn on the lights, Google", for instance
And if not, I would be curious to know why it haven't been fully open sourced.
I was originally somewhat frustrated, but overall, it's much better (let's be honest, YAML sucks) and more user friendly (by that I mean having a form with pre-filled fields is easier than having to copy paste YAML).
Bear in mind that a $50 google home or Alexa mini(?) is always going to be whatever google deem it to be. This is an open device which can be whatever you want it to be. That’s a lot of value in my eyes.
Have they gotten rid of any YAML configs, with things that are now UI only? My understanding was that they've just been building more UI for configuring things and so now default recommend people away from YAML (which seems like the right choice to me).
I used to think so too. But then Kickstarter proved that actually, as long as you have a good advertising style, communicate well, and get lucky, you can get people to contribute literal millions for a product that hasn't even reached the blueprints stage yet.
I believe it boils down to two main issues:
- The narrow AI systems used for intent inference have not scaled with the product features.
- Amazon is stuck and can't significantly improve it using general AI due to costs.
The first point is that the speech-to-intent algorithms currently in production are quite basic, likely based on the state of the art from 2013. Initially, there were few features available, so the device was fairly effective at inferring what you wanted from a limited set of possibilities. Over time, Amazon introduced more and more features to choose from, but the devices didn't get any smarter. As a result, mismatches between actual intent and inferred intent became more common, giving the impression that the device is getting dumber. In truth, it’s probably getting somewhat smarter, but not enough to compensate for the increasing complexity over time.
The second point is that, clearly, it would be relatively straightforward to create a much smarter Alexa: simply delegate the intent detection to an LLM. However, Amazon can’t do that. By 2019, there were already over 100 million Alexa devices in circulation, and it’s reasonable to assume that number has at least doubled by now. These devices are likely sold at a low margin, and the service is free. If you start requiring GPUs to process millions of daily requests, you would need an enormous, costly infrastructure, which is probably impossible to justify financially—and perhaps even infeasible given the sheer scale of the product.
My prediction is that Amazon cannot save the product, and it will die a slow death. It will probably keep working for years but will likely be relegated by most users to a "dumb" device capable of little more than setting alarms, timers, and providing weather reports.
If you want Jarvis-like intelligence to control your home automation system, the vision of a local assistant using local AI on an efficient GPU, as presented by HA, is the one with the most chance of succeeding. Beyond the privacy benefits of processing everything locally, the primary reason this approach may become common is that it scales linearly with the installation.
If you had a cloud-based solution using Echo-like devices, the problem is that you’d need to scale your cloud infrastructure as you sell more devices. If the service is good, this could become a major challenge. In contrast, if you sell an expensive box with an integrated GPU that does everything locally, you deploy the infrastructure as you sell the product. This eliminates scaling issues and the risks of growing too fast.
I'm guessing people reflexively down vote because they hate Amazon and it could read like a defense. I hate Amazon too, but emotional voting is unbecoming of HN. If you want emotional voting reddit is available and enormous.
The respeaker has 4 mics and can easily cancel out the noise introduced by a custom external speaker
For most, yes. But for some included integrations it's UI-only (all of those I've had to migrate, it's been a single click + comment out lines, and the config has been a breeze (stuff like just an api key/IP address + 1-2 optional params).
That's the part I can't do on my own, and then I'll take care of the LLMs myself.
Amazon is one of the richest companies on the planet, with vast datacenters that power large parts of the internet. If they wanted to improve their AI products they certainly have the resources to do so.
They have relesased the designs for the yellow so I assume it will all come. https://github.com/NabuCasa/yellow
Both of the ReSpeaker products in the non-discontinued section (ReSpeaker Lite, ReSpeaker 2-Mics Pi HAT) have only 2 mics, so it appears that things are converging in that direction.
It was only after researching later that I discovered that this wasn't currently possible and recommended approach was to buy some replacement internals that cost more than the device itself (and if I recall correctly, more than the new Home Assistant Voice Preview Edition).
Isn't openHAB an existing popular alternative?
I am sure you know this but maybe some don't know that basically only the hot word detection is on device. It needs to be connected to the Internet for basically everything else. It already costs Amazon.com some money to run this infrastructure. What we are asking will cost more and you can't really charge the users more. I personally would definitely not sign up for a paid subscription to use Amazon Alexa.
I just think this time around is different. Open Whisper gives them amazing STT and LLMs can far more easily be adapted for the NLU portion. The hardware is also dirt cheap which makes it better suited to a narrow use case.
This one looks like it can recognize your voice very well, even when music is playing.
Because... when it works, it's amazing. You get that Star Trek wake word (KHUM-PUTER!), you can connect your favorite LLM to it (ChatGPT, Claude Sonnet, Ollama), you can control your home automation with it and it's as private as you want.
I ordered two of these, if they are great, I will order two more. I've been waiting for this product for years, it's hopefully finally here.
Here, home assistant is telling you: you can use your own infra (most people won't) or you can use our cloud.
It works because most likely the user base will be rather small and home assistant can get cloud resources as if it was infinite on that scale.
If their product was amazing, and suddenly millions of people wanted to buy the cloud version, they would have a big problem: cloud infrastructure is never infinite at scale. They would be limited by how much compute their cloud provider is able/willing to sell them, rather than how much of that small boxes they could sell, possibly loosing the opportunity to corner the market with a great product.
If you package everything, you don't have that problem (you only have the one to be able to make the product, which I agree is also not small). But in term of energy efficiency, it also does not have to be that bad: the apple silicon line has shown that you can have very efficient hardware with significant AI capabilities, if you design a SOC for that purpose, it can be energy efficient.
Maybe I'm wrong that the approach will get common, but the fact that scaling AI services to millions of users is hard stand.
Home Assistant seems insurmountable to beat at that specific metric, seems to be the single biggest project in terms of contributions from a wide community. Makes sense, Home Assistant tries to do a lot of things, and succeeds at many of them.
Nowadays, HA has more of the features I would want and other external projects exist to create your own dashboards that take advantage of native controls.
Today I’m using Homey because I’m still a sucker for design and UX after a long day of coding boring admin panels in the day job, but I think in another few years when the hardware starts to show its age that I will move to home assistant. Hell, there exists an integration to bring HA devices into Homey but that would require running two hubs and potentially duplicating functionality. We shall see.
Am I missing something? Is it that these are just those you know are sharing details, and you can scale that up by a known percentage? :)
Right now I only use Alexa for smart house control and setting timers
Local GPU doesn’t make sense for some of the same reasons you list. First, hardware requirements are changing rapidly. Why would I spend say $500 on a local GPU setup when in two years the LLM running on it will slow to a crawl due to limited resources? Probably would make more sense to rent a GPU on the cloud and upgrade as new generations come out.
Amazon has the opposite situation: their hardware and infra is upgraded en masse so different economies. Also while your GPU is idling at 20-30W while you aren’t home they can have 100% utilization of their resources because their GPUs are not limited to one customer at a time. Plus they can always offload the processing by contracting OpenAI or similar. Google is in an even better position to do this. Running a local LLM today doesn’t make a lot of sense, but it probably will at some point in like 10 years. I base this on the fact that the requirements for a device like a voice assistant are limited so at some point the hardware and software will catch up. We saw this with smartphones: you can now go 5 years without upgrading and things still work fine. But that wasn’t the case 10 years ago.
Second, Amazon definitely goofed. They thought people would use the Echos for shopping. They didn’t. Literally the only uses for them are alarms and timers, controlling lights and other smart home devices, and answering trivia questions. That’s it. What other requirements do you have that don’t fall in this category? And the Echos do this stuff incredibly well. They can do complex variations too, including turning off the lights after a timer goes off, scheduling lights, etc. Amazon is basically giving these devices away but the way to pivot this is to release a line of smart devices that connect to the Echos: smart bulbs and switches, smart locks, etc. They do have TVs which you can control with an Echo fairly well (and it is getting better). An ecosystem of smart devices that seamlessly interoperate will dwarf what HA has to offer (and I say this as someone who is firmly on HA’s side). And this is Amazon’s core competency: consumer devices and sales.
If your requirement is that you want Jarvis, it’s not the voice device part of it that you want. You want what it is connected to: a self driving car you can summon, DoorDash you can order by saying “I want a pizza”, a phone line so it can call your insurance company and dispute a claim on your behalf.
Now the last piece here is privacy and it’s a doozy. The only way to solve this for Amazon is to figure out some form of encrypted computation that allows for your voice prompts to be processed without them ever hearing clear voice versions. Mathematically possible, practically not so much. But clearly consumers don’t give a fuck whatsoever about it. They trust Amazon. That’s why there are hundreds of millions of these devices. So in effect while people on HN think they are the target market for these devices, they are clearly the opposite. We aren’t the thought leaders, we are the Luddites. And again I say this as someone who wishes there was a way to avoid the privacy issue, to have more control over my own tech, etc. I run an extensive HA setup but use Echos for the voice control because at least for now they are be best value. I am excited about TFA because it means there might be a better choice soon. But even here a $59 device is going to have a hard time competing with one that routinely go on sale for $19.
They sell UL rated models, have an option for cloud connectivity but zero requirement, your switch still works if the Shelly loses connectivity with whatever home automation server you have, and it's a small box that you wire in behind the switch.
What the top level comment is asking for, completely unrelated to the article mind you, is to have a smart device in the form factor of a light switch that you can hook into your home assistant system.
The problem they likely have (I have it too) is that you set HA up and it can control smart plugs, smart thermostats, etc, but it can't control 99% of the existing lights in your house because they are wired to dumb lightswitches. Instead of some mechanical finger flicking a switch or something, why not uninstall the existing light switch and replace it with a smart one.
I think the problem with this setup is that it needs to be wifi connected, and if you embed an esp32 inside a wall it will get exactly zero signal. Maybe with external antennas hidden in the switch outer case.
I don't blame them, sure there are millions of devices out there, but some people might own five device. So there aren't as many users as there are devices and they aren't making them any money once bought, not like the Kindle.
Frankly I know shockingly few people who uses Siri/Alexa/Google Assistant/Bixby. It's not that voice assistants don't have a use, be it is a much much small use case than initially envisioned and there's no longer the money to found the development, the funds went into blockchain and LLMs. Partly the decline is because it's not as natural an interface as we expected, secondly: to be actually useful, the assistants need access to control things that we may not be comfortable with, or which may pose a liability to the manufacturers.
An 125H box may be three times the price of an N100 box, but the power draw is about the same (6W idle, 28W max, with turbo off anyway) and with the Arc iGPU the prompt processing is in the hundreds, so near instant replies to longer queries are doable.
they even used a wake word in star trek fwiw
I bought two the second they were announced, I already use the software stack with the m5 atoms and they are terrible devices, but the software works well enough for me.
A lot has changed in the open source ecosystem since commercial assistants were first launched. We have reliable open source wakeword detectors, and cheap/free LLMs can do the intent parsing, response generation, and even action calling.
I guess the benefits that came to mind are - alternative crowdsourced route for sourcing hardware, to avoid things like that raspberry pi shortage (although if it's due to broader supply chain issues then this doesn't necessarily help) - hardware forks! If someone wanted a version with a more powerful ESP32, or a GPS, or another mic, or an enclosure for a battery and charging and all that, took the time to fork the design to add these features, and found X other users interested in the fork to get it produced... (of course I might be betraying my ignorance on how easy it is to set up this sort of alternative manufacturing chain or what unit amounts are necessary to make this kind of forking economical)
Install the whole thing on top of stock Debian "supervised" then you get a full OS to use.
You get a fully integrated MQTT broker with full provisioning - you don't need a webby API - you have an IoT one instead!
This is a madly fast moving project with a lot of different audiences. You still have loads of choice all tied up in the web interface.
Alexa, on the other hand, won't even allow a third party app to read its shopping list. It's no longer clear to me why Alexa even exists any more except as a kitchen timer.
1. https://www.home-assistant.io/blog/2016/01/19/perfect-home-a...
So yeah, Alexa is a dumb product... for Amazon. No one uses Alexa to buy anything from Amazon because the only way you can be sure of what you're ordering from Amazon is to be looking at the site. Otherwise you might get cat food from "JOYFUNG BEST Brand 2024" and not Purina.
Voice Assistants for Home Automation, like what Home Assistant is offering, are awesome. And this in particular is exciting exactly because of Alexa's failure as a product. Amazon clearly does not care about Alexa now, its been getting worse as they try to shoehorn in more and more monetization strategies.
In addition, it's a moron. I'm not sure it's actually gotten dumber, but in the age of chatgpt, asking google assistant for information is worse than asking my 2nd grader. Maybe it will be able to quote part of a relevant web page, but half the time it screws that up. I just want it to convert my voice to text, submit it to chatgpt or claude, and read the response back to me.
All that said, the audio quality is good and it shows pictures of my kid when idle. If they suddenly disappeared I would replace them.
I'm also opted OUT of the analytics.
I’m assuming they eventually want to create their own LLM and something privacy focused would be good match for their customers. I don’t know how they feel about open source though
(Yes, I do understand that "privacy" here is mostly about not sending it for processing to third parties.)
I started using the Box-3 with heywillow which did amazing input and processing using ML on my GPU, but the speaker is aweful. I build a speaker of my own using a raspberry pi Z2W, dac and some speakers in a 3d printed enclosure I designed, and added a shim to the server so that responses came from my speaker rather than the cheap/tiny speaker in the box-3. I'll likely do the same now with the Voice PE, but I'm hoping that the grove connector can be used to plonk it on top of a higher quality speaker unit and make it into a proper music player too.
As soon as I have it in my hands, I intend to get straight to work looking at a way to modify my speaker design to become an addon "module" for the PE.
I currently do STT with heywillow[0] and an S3-Box-3 which uses an LLM running on a server I have to do incredibly fast, incredibly accurate STT. It uses Coqui XTTS for TTS, with very high quality LLM based voice; you can also clone a voice by supplying it with a few seconds of audio (I tested cloning my own with frightening results).
Playback to a decent speaker can be done in a bunch of ways; I wrote a shim that captures the TTS request to Coqui and forwards it to a Pi based speaker I built, running MPD which then requests the audio from the STT server (Coqui) and plays it back on my higher quality speaker than the crappy ones built in to the voice-input devices.
If you just want to use what's available HA, there's all of the Wyoming stuff, openWakeword (not necessary if you're using this new Voice PE because it does on-device wakeword), Piper for TTS, or MaryTTS (or others) and Whisper (faster-whisper) for STT, or hook in something else you want to use. You can additionally use the Ollama integration to hook it into an Ollama model running on higher end hardware for proper LLM based reasoning.
[0]heywillow.io
I have an office-style desk-phone (SNOM) connected to a SIP server and I can pick the receiver up and talk to the Assistant, but you can plug in any way you like to get the audio to/from HA.
With your phone, wake words are usually locked down by Apple/Google so you can't really have it hands-free, and that's the problem this device is solving; not the audio input itself, but the wake-word/handfree input.
On an Android phone, you can replace the Google Assistant with the Home Assistant one, but you still have to activate it the usual way, press a button or launch the app etc.
The hardware in those devices is generally better, most of them have much better speakers, but they're locked down, the wake-word detection hardware isn't open or accessible so changing it to do what we need would be difficult, and you're just hoping there's a way in.
Existing examples of opening them (as in freedom) replace the PCB entirely, which puts you back to square one of needing open hardware.
This feels like the right approach to me; I've been building my own devices for this purpose with off-the-shelf parts, and designing enclosures, but this is much sleeker; I just hope an add-on or future version comes with much better audio out (speakers) because that's where it and things like it (e.g. the S3-Box-3) are really lacking.
The audio out is terrible so I wrote a shim-server that captures the request to the TTS server for heywillow and sent it to a speaker I build myself running MPD on a Pi with a nice DAC and have it play the responses instead of the box-3's tiny speaker.
I don't expect the audio-out on this to be much better with its tiny speaker, but at least it has a 3.5mm jack.
I'm going to look into what that Grove port can do too and perhaps build a new speaker "module" that the Voice PE can sit on top of to make it a proper music device.
If it wasn't fast or accurate for you, what were you running it on? I'm using the large model on a Tesla GPU in a Ryzen 9 server, using the XTTS-2 (Coqui) branch.
The thing about ML based STT/TTS and the reasoning/processing is that you get better performance the more hardware you throw at it; I'm using nearly £4k worth of hardware to do it; is it worth it? No, is it reasonable? Also no, but I already had the hardware and it's doing other things.
I'll switch over to Assist and run Ollama instead now there's some better hardware with on-device wake-word from Nabu.
https://design.home-assistant.io/#concepts/home
https://developers.home-assistant.io/docs/configuration_yaml_index
https://github.com/home-assistant/architecture/blob/master/adr/0010-integration-configuration.md
...usually there's YAML kicking around the backend, but for normal usage, normal users, the goal is to be able to configure all (most) things via UI.I've had to drop to YAML to configure (eg) writing stats to indexdb/graphana vs. sqlite (or something), or maybe to drop in or update an API_KEY or non-standard host/port, but 99% of the time the config is baroque, but usable via the web-app.
I’m also the author of an Alexa skill for a music player (basic “transport” control mostly) that i use every day, and it still works the same as it always did.
Occasionally I’ll get some freakout answer or abject failure to reply, but it’s fairly rare. I did notice it was down for a whole weekend once; that’s surely related to staffing or priorities.
At various times in the past, the teams involved in such projects have at least prototyped extremely invasive features with those in-home devices. For example, one engineer I've visited with from a well-known in-home device manufacturer worked on classifiers that could distinguish between two people having sex and one person attacking another in audio captured passively by the microphones.
As the corporate culture and leadership shifts over time I have marginal confidence that these prototypes will perpetually remain undeveloped or on-device only. Apple, for instance, has decided to send a significant amount of personal data to their "Private Cloud" and is taking the tactic of opening "enough" if its infrastructure for third-party audit to make an argument that the data they collect will only be used in a way that the user is aware and approves of. Maybe Apple can get something like that to a good enough state, at least for a time. However, they're inevitably normalizing the practice. I wonder how many competitors will be as equally disciplined in their implementations.
So my takeaway is this: If there exists a pathway between a microphone and the Internet that you are not in 100% control over, it's not at all unreasonable to expect that anything and everything that microphone picks up at any time will be captured and stored by someone else. What happens with that audio will -- in general -- be kept out of your knowledge and control so long as there is insufficient regulatory oversight.
A 3090 is too expensive and power hungry. Maybe a 3060 12Gb? Is there anything in the "workstation" lineup that is more efficient especially since I don't need the video outs?
I'm sure there are things they could do to better support the power-user engineer use case, but at the end of the day it's a self-hosted web app written in Python that has strong support for plugins. There should be very few things that an engineer couldn't figure out how to do between writing a plugin, tweaking source code, and just modifying files in place. And in the meantime I'm glad that it exists and apparently has enough traction to pay for itself.
https://www.theregister.com/AMP/2023/02/13/linux_ai_assistan...
Hopefully the troll is no longer around
Getting it to play the right thing from voice commands is a bit of a rabbit hole: https://music-assistant.io/integration/voice/
Perhaps Echo/Alexa entice users to become Prime members, and they're not meant to be market leaders. We can only speculate as outsiders.
My point is that claiming that a product of one the richest companies on Earth is not as subjectively good as the competition because of financial reasons is far-fetched.
"Computer, turn lights to 50%" -> "turn lights to fifty percent" -> {action: "lights", value: 50}
"My new computer has a really beefy graphics card" -> "has a really beefy graphics card" -> {action: null}
Part of my misremembering is I was thinking of smaller/iot usecase which, alongside the 10GB VRAM requirements for the large multilingual model, felt infeasible -shrug-
[1] https://git.acelerex.com/automation/opcua.ts/-/project_membe...
Looks like the GitHub is still somewhat active, although their roadmap links to a dead Trello: https://github.com/leon-ai/leon
I don't have a small house, but I'm trying to think why I would need even 5 of these, let alone 20. The majority of the time my family spends together is in the open layout on our main floor where the kitchen flows into the living room with an adjacent sun room off the living room.
I'm genuinely curious why you need so many of these.
I do agree that if you do have a legit use case for so many, buying so many in essentially a first run is a risky thing. Coupled with the ability for this to be supported for more than a fleeting couple of years is also a huge risk.
- handles it locally, if you have fast enough computer
- sends it to home assistant cloud, if you set it up
- sends it to chatgpt, claude sonnet etc. if you set it up
I'm planning on building a proxmox rack server next year, so I'm probably going to just handle all the discussions locally. The home assistant cloud is quite private too, at least that's what they say (and they're in EU, so I think there might be truth in what they say)...The whole point is that you control what these things do, and that you can run these things fully locally if you want with no internet access, and run your own custom software on them if that's what you want to do. This is a product for the Home Assistant community that will probably never turn much of a profit, nor do I expect it is intended to.
They recommend an N100 in the blog post, but I might buy one anyway to see if my HA box's Celeron J3455 will do the job.
9-14 devices for a 5 person household. May be a stretch since I'm not sure if my grandma could even really use it. Bathroom's a stretch but I'm imagining being in the shower and wanting to note multiple showerthoughts
The short version, from the post, is that there are 4 capacitors that are only rated for 6.3v, but the power supply is 12v. Eventually one of these capacitors will fail, causing the board to stop working entirely.
It would be hard for a company to stay in business when they are fighting a patent troll lawsuit and having to handle returns on every device they sold through kickstarter.
This makes sense for cars, where there's much local stuff to control. But for a home unit, what do you want to do that is entirely local? Turning the heat up and down gets boring after a while. If it does entertainment selection or shopping, it needs outside world connections.
(Today's rant: I recently purchased a humidifier. It's just a little unit with a water tank, a water-softening filter, and an ultrasonic vaporizer. That part works fine. Then there are the controls.
All this thing really needs is an on-off switch and a humidity knob, and maybe lights for power, humidification, and water tank empty. But no. It has five touch buttons and a round display about four inches across. The display is on even if the unit is off. Pressing the on/off button turns it on. If it's humidifying, there's a whole light show. The tank lights up purple. Swooping arcs of blue run up both edges of the round display. It's very impressive, especially in a dark bedroom. If you press and hold the second button for two seconds, about half the light show is suppressed.
There are three fan speeds, and a button for that. Only the highest one will propel the water vapor high enough to avoid it hitting the floor and uselessly condensing before it mixes with the air. So that feature was not necessary.
The display shows one number. It's usually the current humidity, but if you press the humidity set button, the number displayed becomes the setting, which is changed upwards by successive presses until it wraps around. After a few seconds, the display reverts to current humidity.
Turning the unit off or removing the water tank resets all settings to the default.
This is the low-end unit. The next step up comes with an IR remote. It's one way - the remote has buttons but no display. Since you have to be close to the display to use the buttons effectively, that doesn't help much. The step up after that is, inevitably, a cloud-based phone app.
So this thing could potentially be interfaced to a voice assistant. That's only useful if there's enough information coming back from the device that the assistant software knows what the device is doing, and the assistant software understands that device status. If all it does is send remote button pushes, the result will be frustration.
So you need some degree of intelligence at both ends - the end that talks to the human, and the end that talks to the device. If the user says "House, it's too dry in here", the assistant system needs to be able to check the status of the humidifier. Has power? Talking? On? Humidity setting reasonable? Fan running? Tank not empty? If it can't do that, it's part of the problem, not part of the solution.)
For example, I have my config on GitHub and share various YAML blueprints with a friend who also has the same Solar+Battery system as I do.
The main pitch of a tool like this is that I can absolutely verify it's not true.
I'm currently running a slightly different take of this (Esp 32 based devices, with whisper through Willow inference server, with Willow autocorrect, tied into home assistant).
For context, it works completely offline. My modem can literally be unplugged and I can control my smart devices just fine, with my voice. Entirely on my local network, with a couple of cheap devices and a ten year old gaming PC as the server.
My data
You can test this in a couple ways: they'll respond to their wake word when the internet is down (but have an error response). You can also look at the outbound data and see they're not sending continuous traffic.
Not to say with the proprietary products that they couldn't sneakily change this on the fly and record everything, maybe even turning it on for a specific device or account.
I’m currently running, HA, Frigate and pihole on same machine
i’m assuming you can do something similar with Google home, etc.
but like you said, you could always build your own dashboard from scratch if you wanted to.
Some notable blog posts, docs and a video on the wake words and voice assistant usage:
https://community.home-assistant.io/t/on-device-wake-word-on...
https://esphome.io/components/voice_assistant.html
https://www.home-assistant.io/voice_control/create_wake_word...
I run many self hosted applications on my local network. Homeassistant is the only one I’m running that has its own dedicated login. Everything else I’m using has OIDC support, or I can at least unobtrusively stick a reverse proxy in front to require OIDC login.
[0] https://community.home-assistant.io/t/open-letter-for-improv...
Edit: things like this [1] don’t help either. Where one of the HA devs threatens to relicense a dependency so that NixOS can’t use it, because… he doesn’t want them to? The license permits them to. Seemed very against the spirit of open source to me.
Amazon is a business and frugality is/was a core tenet. Just because they can put Alexa in front of LLMs and use GPU hours to power it doesn't mean that is the best reinvestment of their profits.
The idea of using LLMs for Alexa is so painfully obvious that people all the way from L3 to S Team will have considered it, and Amazon are already doing interesting R&D with genAI so we should assume that it isn't corporate inertia or malaise for why they haven't. The most feasible explanation from the outside is that it is not commercially viable especially "free" versus a subscription model. At least with Apple (and Siri is still painfully lacking) you are paying for it being locked into the Apple ecosystem and paying thousands for their hardware and paying eyewatering premiums for things like storage on the iPhone
I’d love a no-wake-word world where something locally was always chewing on what you said but I’m not sure how well it would work in practice.
I think it would only take 1-2 instances of it hearing “Hey, who turned off the lights?” in a show turning off my lights for real (and scaring the crap out of me). Doctor Who isn’t particularly scary but if I was watching Silence in the Library and that line turned off my lights I’d be spoked and it would take me a hot minute to realize what happened.
An actually good product in this space IMO needs to be able to define specific sets of actions and allow agents to perform only the permitted actions.
What you describe is unquestionably a group buy, but in, for example, the mechanical keyboards community, a "group buy" is paying the designer of a thing (keyboard, keycap set, etc.) for the expense of third-party production up front. It's really more of a preorder that requires a certain volume to proceed. But regardless, they're called group buys in that hobby.
(With expected mixed results, I should add -- plenty of keyboard "group buys" never come to fruition, and since they're not backed by a Kickstarter-like platform, the money is just gone. The /r/mechanicalkeyboards subreddit has many such stories.)
Thanks; it seems I actually needed to spell that out in my post.
That's exactly why there's massive latencies between command recognition, processing, and execution.
Imagine if it had sub-ms response to "assistant, add uuh eggs and milk to the shopping list... actually no just eggs sorry"
Perhaps your definition of "private" is more stringent than most people's. Collective privacy exists, for example "The family would appreciate some privacy as they grieve". It is correct to term something "private" when it is shared with your entire household, but no one else.
Thanks for the heads up about Digital Alchemy, now I have to go and evaluate it 8)
I already have upgrades to my 3D printer sat waiting, and a massive stack of software to go through for work and home.
I've just finished replacing all the door handles in my home (long story) and the flush button on the down stair bog. It turns out that most of my home lighting has a spare conductor available or I can use a dimmer instead, so smart lighting is indicated at the switch. One lot done, more to do.
All of my smart IoT stuff must be locally administrated and have a manual option if the network is unavailable, if possible and work as well as a non smart effort with regards power. So my doorbell is a Reolink job on the THINGS VLAN with no access to the internet. It is PoE powered and the switch is powered by a UPS. You get the idea.
I run my home IoT smart stuff with the same rigor as I do at work. I'm an IT consultant these days but I did study Civ Eng at college.
HA allows for plenty of solid engineering for engineers. You can do it all in the GUI with decent integrations as a "newbie" with confidence that you won't go too far wrong. You've also got a really solid Zwave stack aside a well integrated MQTT stack - how much more do you want?
Theres also a Zigbee stack too, which is ideal for cheap stuff. My Lidl smart switches work really well at quite a long range.
I could go on at length ... 8)
How much more engineer do you need?