FRANCESCO RODELLA | Tungsteno
"I've never loved anyone the way I love you", Theodore declares to Samantha, his futuristic voice assistant in the movie Her. Unlike this intimate relationship from science fiction, in our daily reality these artificial intelligence systems often fail to understand us (or misunderstand what we want), and many users are less than enthused.
Despite this, Siri, Alexa, Assistant, Bixby and Cortana have begun to have a considerable impact on our day-to-day lives, driven by the technology giants (Apple, Amazon, Google, Samsung and Microsoft, respectively). Already present in our smart phones, tablets, televisions and speakers, they will continue to grow in 2019—according to various analyses and studies— linked to new amenities in fields such as home automation.
At the moment, these assistants find it difficult, for example, to capture the speech of specific user profiles, such as people who stutter, who have accents different from the standard one in a certain language or who are bilingual. But even in a more common type of experience, users will notice that, in most cases, these systems are unable to hold a conversation. Why is this?
Voice assistants replicate the operation of Internet search engines. Credit: Bence Boros
Luis Alfonso Ureña, a professor in the area of Computer Languages and Systems of the University of Jaén, explains that these systems are supported by a technology called natural-language processing (NLP). Its goal is to be able to capture a human message (in this case voice), translate it into programming language to be able to understand it, and generate understandable answers for us. NLP can run into problems with the complex aspects of languages, such as irony, sarcasm or words with more than one meaning, according to the expert.
Specifically, current virtual assistants represent an evolution of the search-response system that supports the operation of search engines such as Google and Bing, says Antonio Moreno, from the Institute of Knowledge Engineering. "When we talk to the machine, it translates the requests and makes a search for each of the questions. But there is no link between them," he explains. The computational linguist provides an example of a failed dialogue:
User question: How's the weather today in Vancouver?
Answer from the virtual assistant: Today in Vancouver it's 20 degrees and the sky is cloudy.
Q. Do I have to carry an umbrella?
R. Results of the search "places where you can buy umbrellas in Vancouver".
The main difficulty, Moreno says, is that virtual assistants recognise the "infinite possible messages" in natural language. "Nor can a human dominate them all. You cannot ask machines for things that we are not capable of, and we are the ones who program them," he says.
Keys to reducing misunderstandings
The expert recalls that there are also assistants "focused on a specific task", such as the sale of a train or movie ticket. Narrowing the field of action allows the user's experience to track closer to that of a real conversation, he argues. "These systems work better because programmers have previously designed a dialogue tree, which has a starting point and reaches a specific place."
According to Moreno, a useful strategy is to introduce pre-programmed answers for when the assistant does not understand a sentence. In this way, the goal is to obtain a less ambiguous question from the user. Reducing misunderstandings is an objective also declared by Sherpa, the Spanish start-up in the sector. Its technology "incorporates five levels of linguistic analysis (morphological, syntactic, semantic, pragmatic and functional)" to eliminate ambiguities, says CEO Xabi Uribe-Etxebarria.
The understanding of natural language is the great challenge of voice assistants. Credit: Piotr Cichosz
For Ureña, the technology of virtual assistants such as Siri or Alexa is still "in its infancy." The professor believes that in the future their capacities will be extended much more. Among the features that can be improved, he indicates the need to not only answer "factual" questions, (that is, based on data such as when, where, who), but also to be able to handle more complex requests, and that the answers are precise and provided in real time. He adds that research is even being done on how to represent comprehensively for more complex patterns such as feelings and emotions. "The assistants have to be adapted to us, to continue learning," he says.
How to encourage dialogue?
In fact, the automatic learning systems of vocal assistants are in constant training. "They listen and process the information they receive continuously, although they are activated by a keyword," says Ureña. In this way, they can expand the recognisable questions and have more examples of how each can be formulated. And as some tests show, in just one year their percentage of success in the answers can increase considerably.
Researchers and companies working in the field of NLP now have one more challenge. "Reasoning according to common sense, understanding of natural language and the question/answer systems fed by Artificial Intelligence are very useful, but they are not yet capable of sustaining a dialogue. The deep understanding of natural language is still a challenge," says the AI Index Report, conducted by Stanford University and published last December.
Antonio Moreno believes that the big technology companies are devoting resources to meet this challenge, but he says he is not aware of any potential solutions near at hand. They are not wasting their time, however, as demonstrated by initiatives such as the presentation of the Debater Project, an IBM robot that is already able to participate in a chat with a world champion debater. One more step towards the intimate dialogue imagined by Spike Jonze in his film Her.
· — —
Tungsteno is a journalism laboratory for investigating the essence of innovation, devised by Materia Publicaciones Científicas for Sacyr's blog.