Originally published on the Kicker Studio blog.
I hate technology. Seriously, nothing can make me as angry as technology can, because technology doesn’t care about me. I mean this literally. It has no emotion, so it can’t care. What’s worse is it can’t even fake it properly. It’s totally oblivious! It doesn’t empathize with me and doesn’t respond appropriately for the situation, not even if I’m finding out I have a deadly disease. “Sorry, I don’t understand hemorrhaging. Please try again.”
In these moments, when I’m losing my shit, technology needs to talk calmly and clearly. Instead, it blithely repeats every irrelevant option it always offers, which makes as much sense as these ridiculous Americans who speak English in a really loud voice to foreigners, thinking that if they just yell, they’ll be understood… but at least you can smack those people. Technology you can’t even smack without wasting a lot of money because now you’ve gone and broken the damn thing.
When someone is stressed, they lose their ability to focus and understand what is happening around them. In these moments, the sympathetic nervous system kicks in and the brain goes myopic on us, shutting out choices and scrambling all sensory input, except for the info you need for survival. Blood boils, eyes pop out of heads, breathing gets shallow and in the mind’s eye, a team of guys in track suits start jumping up and down, screaming “Run! Run for your lives!”
Technology could care less about us humans and our stressed out lives. This is made all the more infuriating when it’s an NUI technology – which is designed so that we can relate to it the way we do living things. Here, the betrayal is twice as bad, because NUI technology should know better, for god’s sake. Voice, is a great example because it sort of sounds cognizant, suggests embodiment, and reminds us of ourselves, therefore, it’s that much more egregious of a crime when voice technology fails us utterly, kicks us in the nuts when we’re already down, and doesn’t even realize it’s doing anything wrong.
Humans respond to a person in stress empathetically. If someone is frantically asking us for help, we try to calm them and we ask them simple questions. But voice interface is not like that. At all. Turns out, voice interface is like the Honey Badger.
Personally, I’ve been excited about voice interface for a long time. I get very bad motion sickness and can’t use technology in any moving vehicle, not even if I’m a passenger. Looking down causes me to be immediately ill. So I was really rooting for my new voice interface on my fancy new Android phone.
The other day, I was trying to pick up a friend of mine at the airport. Traffic was crazy on the freeway and I needed to let her know I would be there a few minutes later than we’d planned. I can’t text and drive, so I decided to use my voice interface. I double pressed the button to wake up the voice and I said “Text Nora I’m on my way”. I waited a long time. Nothing happened. I looked down and saw the words “Network Error”.
I did a bit of deep breathing, crossed my fingers and repeated the process again. This time the voice heard me, but couldn’t find the contact. Nora is a frequently used contact, so this made no sense. I asked again. Again, it insisted that there was no Nora in the contacts. It even spelled Nora correctly when it informed me of this. My grip on the steering wheel tightened. Ok, so now you’re just f**king with me, right? I pulled over, found the contact by hand, and used the voice feature again, to send a text, and it worked. Ok. Fine. Voice interface had some flaws, but, once I found the contact info manually, thereby giving it visual parameters, it worked, basically… sort of.
Much later, I was driving back from a client meeting and had some ideas I really wanted to get down. For years, I’ve desperately wanted a voice-to-text service that I could use in the car. I typically get ideas while driving and have always wished I could just voice-text them to myself. So here we go! I activated the voice interface and said “record voice note”. This is what happened: For about 45 minutes, every time I woke up the voice interface, it would ask me “Who would you like to message?”. I could not get it to cancel. I could not get it to change the topic. It ignored requests for help, that useless scum. In fact, that incorrigible shit ignored every attempt I made to get out of that mode, and who even knows what “that mode” was?! I even restarted the phone. I was crawling in traffic fantasizing about murder.
At the end of this exchange, I was in tears. Frustrated tears, staving off road rage, and stuck next to a police officer — so no looking down – Jesus, I really needed to take down the notes, and at this point, I cannot for the life of me remember what revelatory information I wanted so badly to record. Instead, I came home and wrote this post.
So yeah, my voice interface and I are still in a fight. I mean, imagine the scenario I just described as a conversation with someone sitting with you in the car. That idiot would’ve gotten a swift kick in the ass and been jettisoned to the curb. Game over. You think you’re funny, stupid voice? Well hahaha. Looks like you’ll be walking home, buddy.
So like I said, I hate technology, and here’s the challenge. How about we design voice technology that actually works the way a conversation does? How about voice technology that would be responsive not only to my words, but my tone and the context of our conversation, as well, the same way a person would?
Studies have shown that when people interact with a non-empathetic voice interface while driving, especially if they’re in a heightened emotional state of either happiness or distress, they’re 2xs as likely to crash their car. This is just awful. We might as well just be texting while driving!
However, the good new is, that emotion detection technology actually exists, and it’s quite good at detecting emotion at either end of the happy – upset continuum. Google is proving that it’s possible to design emotion and context sensitive voice technology, based on their research which shows that humans really do behave within fairly reliable patterns. This makes it possible for voice technology to provide relevant data to the user when he or she is most likely to want it. Adding this type of predictive/contextual analysis to voice interface will make it using it way more worthwhile.
What’s more, we can train this same voice technology to respond to our specific tone and cues, making it customizable by user. We can now say to our devices, “Learn my language.” Chances are, I make the same request the same way, every time.
Learn it. Big dog did it. You can too.
Written by Jody Medich & Wendy Rolon