Conversational UX Design: Creating Persistent Conversations
1. Conversations with Systems: Beyond Natural Language Processing
Anyone who interacts on a daily basis with virtual assistants and chatbots will know that one of the main characteristics of these conversations is the fragmented nature of the interactions.
Although conversational interfaces should, in principle, be transparent to the user—since they use natural language—the reality is that computers are “still learning how to speak”. As a result, humans find that they need to adapt to them by experimenting with different questions and commands in order to understand the scope of the agent’s capabilities.
This learning process is often frustrating for users: the system is unable to recover naturally from errors, access to the set of topics that can be discussed with the assistant is not intuitive, and conversations cannot be maintained for longer than two or three turns. As a result, users often give up quickly on conversational exchanges with intelligent assistants and limit their use to just a few simple tasks.
But why is it so difficult to enable systems to converse with us properly, especially since Natural Language Processing has led to so many advances in the ability of computers to process, analyze, and make sense of a large amount of natural language data. The answer is that there is something else beyond just language that must be modelled in a conversation in order to maintain a persistent interaction with a machine.
To be more precise, there are two main components in human interaction that are still not sufficiently taken into account in the field of human-machine interaction: 1) the structured nature of human conversation, and 2) the need to establish and maintain a mutual understanding with the system even before beginning a conversation with it.
The goal of creating structured conversations with conversational agents, and the attempt to establish an interactive context that guarantees a fluid understanding between user and machine, fall under the purview of the field of Conversational Design. In this post, I will argue that maintaining a human-centered focus in conversational design can bring a number of specific benefits to the functioning of conversational agents.
2. The Interactive Structure of Conversation
Although human conversation can, at first sight, seem chaotic, in fact it is a collaborative process in which interlocutors exchange turns in an organized manner. These turns are shaped step by step into sequences that follow standardized structures.
For example, in the opening of a conversation, the structural place for how-are-you-questions canonically follows the turns dedicated to greetings, as in the sentence “Hi Pablo! How are you today?”. If somebody were to greet me with a sentence that changed this canonical order—"How are you today? Hi Carmen!”—while it wouldn’t break the way I understand the words as a human, I would need to reinterpret the behaviour, expending additional cognitive effort to try to infer the speaker’s intention (“Is he trying to be funny by speaking in a weird way?”).
In contrast to how this exchange would play out with two human participants, the production of such an awkward opening by a system would trigger a completely different reaction in the user. The human would immediately lose trust in the conversational capabilities of the assistant or chatbot and in the quality of the services that the agent is supposed to provide. This would happen because de-structured exchanges of this type disrupt the common ground that the human and system must share if they are to succeed as conversational partners. While the assumption in a human-to-human conversation would be that the other human is playing a game or making some kind of point, in a human-to-system conversation this destructuring would be perceived as an error, and would thus have very different consequences.
3. Creating Common Ground with a System
Common ground is the second part of that “something else” in natural conversation that conversational UX design aims to develop and integrate into conversational systems.
In human verbal interaction, conversational partners typically signal at every turn whether they understand each other. In addition, each human builds the meaning of their next turn around the content of what the other person has said in the previous turn. This process of signalling understanding and reflecting that understanding back generates common ground between the interlocutors. This common ground serves as a resource that allows for participants to access and share the same knowledge that derives from the local progression of the conversation and, in more general terms, from the knowledge that the interlocutors share about the world.
When the conversational partner is a conversational software agent, the system speech turns automatically become part of the user’s mental model and the interactive structure will become the reference against which the human participant evaluates how efficiently the system can handle a conversational exchange. As a consequence, if the conversational agent’s responses are incoherent or the strategies applied do not succeed in maintaining common ground with the user, the human will most likely conclude that the conversation with the computer is inefficient and will abandon the interaction.
4. Human-Centered Conversational UX Design
The abandonment of a conversation is a negative outcome for the user, just as it is for the system. On one hand, the user loses the convenience that is afforded by the ability to access information simply by maintaining a conversation with a conversational agent. On the other, the system is unable to engage in continued interaction with users and, thus, loses access to the more frequent and extended conversations that translate into better quality data which can then be tracked and applied with the goal of improving performance.
To solve this problem, conversational UX design combines AI advances with linguistic insights into the nature of human conversation to create persistent conversations with computer agents. This human-centered AI design approach aims for human models of conversation to be prevalent in those cases in which the system is able to process and react to human strategies of multi-turn sequence modeling and generation of common ground.
4.1 Conversational Structures as Conversational Components
Turning back now to the first “something else” of human conversation— its structured nature—the conversational designer needs to design for interactive structures. This means creating human-to-machine conversations in which the conversational components that articulate natural language are adapted into the “mental” model of the system.
These structures act as conversational components that organize themselves in two textual dimensions: the global and the local level of the conversation.
At the global level, the most relevant macro-components that the system has to be able to handle are:
- The opening of the conversation, where interlocutors exchange
greetings and ritualistic how-are-you-questions after they establish
- The topical core of the conversation, which contains the semantic
content of the interaction. Its subcomponents are topical sequences
that alternate in accordance with the rules of the topic-change
mechanism. For machines, the most relevant topical sequences to
- Sequences generated by questions, requests, and offerings.
- Repair sequences, which enable the re-establishment of common ground
after an error.
- The closing of the conversation, the last part of the interaction in
which speakers formally transition into the end of their interaction
and finally exchange farewells.
At the local level of the conversation, the most relevant conversational components are:
- Speech turns: divided between speakers’ turns and listeners’
backchannels, which generally alternate with one another following
the rules of the turn-taking mechanism.
- Adjacent pairs: two turns that are linked by a conditional relevance
relationship, as happens in the case of questions and answers.
- Expansions of the adjacent pair: microstructures that complete the
basic adjacent pair by turning it into a topical sequence.
The progressive structuring of speech turns and backchannels into adjacent pairs, expansions of the adjacent pair, topical sequences and macro-sequences is what turns an exchange between interlocutors into an organized, cohesive and coherent conversation.
A system that doesn’t adapt to this level of natural conversation will not be able to successfully interact with a human. This problem is normally resolved by applying a human-centered approach to conversational design.
4.2 Human-centered Design of Common Ground Strategies
Returning to the subject of the common ground shared by participants in a conversation, in human-to-human conversations listening errors, speaking errors or cognitive errors represent the most typical examples of moments of disruption of the common ground between interlocutors. But humans have the ability to repair their conversational mistakes. A repair is the action of fixing an error so as to enable the conversation to get back on track.
Repair occurs through a repair sequence, the group of speech turns exchanged between the listener and the speaker that solves the interactional problem and brings interlocutors back to the topic they were discussing.
During this process, interlocutors, despite being in an error state, show a high degree of conversational collaboration. Phrases such as “What do you mean?”, “Sorry, come again?” or “Could you repeat that, please?”, and the repair sequences derived from them, represent very effective strategies for supporting the return of interlocutors to the original topic they were discussing before the error appeared, and thus effective strategies for the resumption of mutual understanding.
By contrast, intelligent assistants use repair generators such as “Sorry, I’m not sure how to help at that, but I’m still learning”, “Sorry, I don’t understand”, or “Sorry, I can’t help with that yet”. These expressions do not serve the function of extracting additional information to reinterpret the human input, do not remove any of the interlocutors from the error state, and, ultimately, only serve to end the conversation.
In this area of human-machine interaction, conversational design should focus on the application of human repair techniques in a way that will advance our knowledge of and methods for building common ground with conversational agents. These techniques go beyond repair, including the correct modeling of conversational strategies such as turn taking, talk interruptions and overlaps, conversational backchannels, and many other such elements.
When deployed alongside the capabilities for linguistic comprehension already provided by Natural Language Processing, human-centred conversational design helps ensure the successful structuring of human-machine conversations through the establishment of a mutual understanding. Thanks to the work of the conversational designer, each conversational component that manages to fit human needs translates into the ability to conduct longer exchanges with more complete sequences, providing higher quality data for analysis and, in general, more motivated users.
While the adaptation of human conversational models to interaction with systems obviously requires that they be recast in a form appropriate to the mental model of the machine, this effort is worthwhile because the gradual application of such models guarantees better long-term results.