We’re seeing more applications of conversational AI, through in-home speakers, in car assistants and on our phones. These dialogue systems are an interface in which users interact using words.
To be effective, dialogue systems need to master language to the extent that they need to understand the users’ sentences but they also need to be able to communicate and generate new sentences.
To achieve these capabilities, the research community has been exploring a two-step process. First, given dialogue history, the system learns to generate sentences that are sensible in this context. If I’m asking several questions about movies playing at my local theatre (films, times, ticket availability), the system needs to remember and use this contextual information.
Beyond this, the system learns to select between the different possible sentences depending on what it is trying to achieve. For instance, a system which tries to maximize user engagement learns to ask questions to keep the dialogue moving forward.
One can think of this process as language acquisition followed by learning to accomplish tasks through language. The first step of language acquisition is often performed by supervised learning: the system observes sentences produced by humans and learns how to map the sentence generated by one speaker to the next sentence generated by the second speaker. After observing many sentence-response pairs, the system can produce reasonable responses for a given dialogue context.
The next step consists of generating words, not only based on what would be reasonable, but also based on the system’s goal. This goal might be maximizing user engagement, helping a user to accomplish a task such as booking a movie ticket, or playing a game of 20-questions.
The system is trained to perform this step through reinforcement learning. During this step, the system learns how to plan the responses it generates based on the long-term outcome that they will have. For instance, in the case of maximizing user engagement, the system learns that asking questions results in longer dialogues compared to generating declarative responses.
A major challenge with teaching a dialogue system to plan is that a user simulator is needed: the system is going to generate new sentences compared to the ones observed in the data so it is necessary to simulate how a user would respond to these sentences to estimate their long-term outcome.
In other terms, to efficiently train a dialogue system…we need a dialogue system! To overcome this limitation, research has moved towards trying to make systems learn faster so that they can learn by communicating with humans directly.
We’re working towards a future where users program their AI assistants through language, by directly giving them feedback on how they perform.
Layla El Asri, Research Manager, Maluuba
Join 25,000+ people who read the weekly 🤖Machine Learnings🤖newsletter to understand how AI will impact the way they work.