At various times in the past, the teams involved in such projects have at least prototyped extremely invasive features with those in-home devices. For example, one engineer I've visited with from a well-known in-home device manufacturer worked on classifiers that could distinguish between two people having sex and one person attacking another in audio captured passively by the microphones.
As the corporate culture and leadership shifts over time I have marginal confidence that these prototypes will perpetually remain undeveloped or on-device only. Apple, for instance, has decided to send a significant amount of personal data to their "Private Cloud" and is taking the tactic of opening "enough" if its infrastructure for third-party audit to make an argument that the data they collect will only be used in a way that the user is aware and approves of. Maybe Apple can get something like that to a good enough state, at least for a time. However, they're inevitably normalizing the practice. I wonder how many competitors will be as equally disciplined in their implementations.
So my takeaway is this: If there exists a pathway between a microphone and the Internet that you are not in 100% control over, it's not at all unreasonable to expect that anything and everything that microphone picks up at any time will be captured and stored by someone else. What happens with that audio will -- in general -- be kept out of your knowledge and control so long as there is insufficient regulatory oversight.