Jason Mars is an African-American professor of computer science who also runs a tech startup. When his company’s artificially intelligent smartphone app talks, he said, it sounds “like a helpful, young Caucasian female.”
“There’s a kind of pressure to conform to the prejudices of the world” when you are trying to make a consumer hit, he said. “It would be interesting to have a black guy talk, but we don’t want to create friction, either. First we need to sell products.”
Mars’ startup is part of a growing high-tech field called conversational computing. This technology is being popularized by programs like the Siri system in Apple’s iPhone, and Alexa, which is built into the Echo, Amazon’s artificially intelligent home computing device, the Echo.
Conversational computing is holding a mirror to many of society’s biggest preconceptions around race and gender. Listening and talking are the new input and output devices of computers. But they have social and emotional dimensions never seen with keyboards and screens.
Do we, for example, associate the stereotypical voice of an English butler — think of Jarvis the computer in “Iron Man” — with a helpful and intelligent person? And why do so many people want to hear a voice that sounds like it came from a younger woman with no discernible accent?
Choosing a voice has implications for design, branding or interacting with machines. A voice can change or harden how we see each other. Where commerce is concerned, that creates a problem: Is it better to succeed by complying with a stereotype, or risk failure in the market by going against type?
For many, the answer is initially clear. Microsoft’s artificially intelligent voice system is Cortana, for example, and it was originally the voice of a female character in the video game “Halo.”
“In our research for Cortana, both men and women prefer a woman, younger, for their personal assistant, by a country mile,” said Derek Connell, senior vice president for search at Microsoft. In other words, a secretary — a job that is traditionally seen as female.
Last week, Google introduced a number of voice-based products, including Google Home, its version of Echo. All of them use Google Assistant, which also speaks in tones associated with a young, educated woman.
Google Assistant “is a millennial librarian who understands cultural cues, and can wink at things,” said Ryan Germick, who leads the personality efforts in building Google Assistant. “Products aren’t about rational design decisions. They are about psychology and how people feel.”
The company has had internal debates about whether to respond differently on questions to the computer about suicide, Connell said. “We’ve leaned to providing information about suicide prevention everywhere,” he said, as opposed to offering no advice at all.
But sometimes, if you want people to figure out quickly that they are talking to a machine, it can be better to have a man’s voice. For example, IBM’s Watson, when it talks to Bob Dylan in television commercials, has a male voice. When Ashok Goel, a professor at the Georgia Institute of Technology, adapted Watson to have a female voice as an informal experiment in how people relate to conversational machines, his students couldn’t tell it was a computer.
But Watson’s maleness is the exception. Amazon’s A.I. technology is another in the comforting female voice camp.
“Alexa was always an assistant, and female,” said Peng Shao, who worked at Amazon on the Echo and is now at a Seattle startup, building another speech-based A.I. system. Amazon would not comment on its product.
We don’t just need that computerized voice to meet our expectations, said Justine Cassell, a professor at Carnegie Mellon’s Human-Computer Interaction Institute. We need computers to relate to us and put us at ease when performing a task. “We have to know that the other is enough like us that it will run our program correctly,” she said.
That need seems to start young. Cassell has designed an avatar of indeterminate race and gender for 5-year-olds. “The girls think it’s a girl, and the boys think it’s a boy,” she said. “Children of color think it’s of color, Caucasians think it’s Caucasian.”
Another system she built spoke in what she termed “vernacular” to African-American children, achieving better results in teaching scientific concepts than when the computer spoke in standard English.
When tutoring the children in a class presentation, however, “we wanted it to practice with them in ‘proper English.’ Standard American English is still the code of power, so we needed to develop an agent that would train them in code switching,” she said.
Mars’ company, called Clinc, makes personal financial smartphone software that answers questions like “how much can I spend on a computer?” It relies on a similar Google-created female voice.
He is hoping for enough success that he can eventually test and counter stereotypes with unexpected A.I. voices. “You need to be at a certain size before you can address these questions,” said Mars, who teaches at the University of Michigan.
But maybe not too big. “I think consumers will eventually be open to exploring different voices and types,” he said. “Companies, they’ll probably stay conservative about it.”