Two days ago I walked in the door, cracked a frosty beverage, and uttered a now-familiar refrain: "Alexa, what's the news?"
Silence. I asked again. Still nothing. And so I walked over to my Echo only to find that it was - gasp - unplugged (my wife has never cared for my relationship with Alexa; that's a topic for another post). So here I was, just a guy, in his kitchen, talking out loud to an imaginary woman.
Thing is, it's not the first time this has happened recently, or the only context. When I hop into my car, I find myself wanting to ask Alexa to tune the radio station or turn up the heat (Siri via CarPlay is a faint shadow of the Alexa experience). When I'm away from home, it's the same - I miss the ease and convenience getting the forecast while my hands are busy selecting a jacket, or finding out if the Bruins are playing while I'm doing the dishes.
These are first-world problems, to be sure, but until I bought an Echo it wasn't clear to me just how many daily use cases there are for which a voice is the perfect interface, nor how wide the experience delta is between a "good" voice interface (Siri) and a "great" one (Alexa). 1
This is a pattern that's repeated itself in technology time and time again - we didn't know how tedious a DOS interface was until we were presented with a high-functioning GUI with a mouse, we didn't realize how poor the keyboards on our Blackberrys were until we experienced a high-quality multi-touch screen, and we won't accept that voice can be a dominant interface until we experience a high-functioning version in our day-to-day lives.
But in each of these cases, the interface(s) that came before are marginalized for us in many ways, even if they're not driven out to pasture forever.2 And it certainly feels to me like Alexa has started to do this to many of the other interfaces I use.
We still have a long way to go, and Alexa is far from perfect. But in my experience she's far more reliable, accurate and conversational than Siri. Maybe we could come up with a standard of measure for voice interfaces? Something like the percentage of first interactions where the response is expected/correct? ↩
Marginalized is perhaps more negative way to look at the situation; specialized would be the opposite view. What we're really talking about is a reduction of the number of use cases for which a given interface is ideally suited. For example, just because I could dictate this post into a voice interface doesn't mean I will - the keyboard/screen combination is still a more ideal interface for the task at hand. ↩