The demonstration that Sundar Pichai made during the Google I / O 2018 conference left us all amazed, and although technology certainly made its capacity clear, the debate about privacy, transparency and misuses is inevitable.
A robotic voice that seems totally human
We had just 35 minutes of conference when Sundar Pichai started talking about Google Assistant. The system, he said, wanted to solve a common problem: that of small businesses that do not have automated online reservation systems and then, this (minute 35:00):
That call in which a machine spoke with a person and did it in a completely natural way marks a turning point. One in which the understanding of natural language, deep learning and the dictation of texts shows for the first time that can deceive us and make us think that we are talking to a real person.
The hairdresser in charge never suspected that the caller was a synthesized voice – those pauses, those “ahmmm …” and those “mm-hmm …” helped the robotic voice to become a human voice, with human intonation and with those same pauses and doubts that we usually do when talking about human beings.
As indicated by Pichai, the system is the result of several years of work in these areas. He showed some different example, such as the system call for a restaurant reservation that was not completely resolved – “we do not reserve for less than 5 people, you can go directly, and there will be room” – but still the assistant got what the user wanted: not having to make the call and get that reservation (or something similar to it).
Neural networks to speech
As explained in the artificial intelligence blog of Google, Google’s conversational system Duplex is based on a Recurrent Neural Network (RNN) – we talk about this technology in depth here – developed by TensorFlow Extended.
As with other similar systems, to achieve this precision in their level of conversation the neural network had to be trained with anonymized telephone conversations. The system uses Google’s Automatic Speech Recognition (ASR) technology and analyzes different parameters to differentiate the context and understand what the interlocutor is saying. He is even able to understand when he is being interrupted and why:
To make that voice sound natural, the dictation system (Text to Speech, TTS) is used using Tacotron and WaveNet to control the intonation. The most interesting thing here is the introduction of the so-called ‘speech dysfluencies’, those pauses in the form of ‘ahm’ and ‘uhm’ which are common among human beings when expressing themselves and which made the synthesized voice even more convincing at the time to look human.
The system is able to maintain conversations in a “totally autonomous, without human intervention”, explain in Google, but also integrates a monitoring system that warns a human operator that a certain task could not be completed.
Google Duplex, of course, is not designed at the moment to talk about anything: it is designed for reservations of services, something that reduces the context of questions and answers and that limits it to make everything more manageable.
Transparency, privacy and bad uses
Listen to Google Duplex in operation is really impressive, and in fact some raise whether or not this system could overcome the Turing test -the conversational button. Probably not broadly, but of course these conversations suggest that it is possible to deceive anyone with this kind of voice synthesis and recognition of natural language.
The last of those tweets is important. Should not Google warn you that you are talking to a machine? Here there is a debate that goes beyond technology and infiltrates fields such as ethics – including robotics ethics – or philosophy, and many analysts, experts and conventional end users made clear their doubts on Twitter regarding a system that then I managed to solve the task, but through deception, be this more or less innocuous.
In Google have recently indicated that “we are designing this feature with the integration of warning messages”, which suggests that in the final implementation the system would effectively warn that the human interlocutor is communicating with a machine.
There are more shadows in this impressive advance, as are those that affect the privacy of those conversations used to train the system. It is likely that Google Duplex will record the entire conversation, record it and analyze it (previous anonymization) to “improve the service”, as all these systems usually advise. However, suspicions about what can be done with all those recordings are obvious. Here Google must also be transparent about what is saved, how it is saved and for how long.
And finally there is the problem of bad uses. It will be difficult to control as always that a tool is used in a bad way and in fact that should not stop this technological evolution – doing it would be a mistake – but if Google implements this technology, it will be important to know how it protects us from fraudulent uses that could be automated by example deceptive telemarketing calls.
We will see where all this takes, but of course we are facing one of the great technological surprises so far this year. We hope that its implementation is appropriate and that, as Google claims, this idea is applied to improve our lives and not to make them worse.