The case against conversational interfaces

WPM and other attempts at putting a one specific number/metric to point to is imo only mudding the waters. Better way to think about just how awfully slow natural language (on average) is as an interface is to think about interactions with {whatever} in terms of *intents* and *actions*.

Comparing "What's the weather in London" with clicking the weather app icon is misleading and too simplistic. When people imagine a future driven by conversational interfaces, they usually picture use cases like:

1. "When is my next train leaving?"

2. "Show me my photos from the vacation in Italy with yellow flowers on them"

3. "Book a flight from New York to Zurich on {dates}"

...

And a way to highlight what's faster/less-noisy is to compare how natural language vs. mouse/touch maps onto the Intent -> Action. The thing is that interactions like these are generally so much more complex. E.g. Does the machine know what 'my' train is? If it doesn't, can it offer reasonable disambiguation? If it can't, what then? And does it present the information in a way where the next likely action is reachable, or will I need to converse about it?

You could picture a long table listing similar use cases in different contexts and compare various input methods and modalities and their speed. Flicking a finger on a 2d surface or using a mouse and a keyboard is going to be — on average — much faster and with less dead-ends.

Conversational interfaces are not the future. Imo even in the sense of 'augmenting', it's not going to happen. Natural-language driven interface will always play the role of a supporting (still important, though!) role. An accessibility aid when e.g. temporarily, permanently, or contextually not able to use the primary input method to 'encode your intent'.