As robots become more integrated into our daily lives, they must increasingly deal with situations in which socially appropriate interaction is vital. In such settings, it is not enough for a robot simply to plan its actions to perform particular tasks; instead, the robot must also be able to satisfy social goals and obligations that arise through interactions with people in real-world settings. As a result, a robot requires not only the necessary physical skills to perform tasks in the world, but also the appropriate social skills to understand and respond to the intentions and needs of the people it interacts with.
The goal of the JAMES project - Joint Action for Multimodal Embodied Social Systems was to develop an artificial embodied agent that supports socially appropriate, multi-party, multimodal interaction. JAMES focuses on the qualitative aspects of task achievement in social situations, and how such tasks can be improved through multimodal communication, rather than the physical aspects of traditional robotics tasks. In particular, JAMES developed the core cognitive capabilities that enabled a robot to interact with humans in a socially-appropriate manner, and demonstrated this behaviour in a bartending scenario.
To help guide the research, JAMES focused on five key objectives:
- Data collection: To record and analyse the social and task-based behaviour of humans engaged in multimodal joint activities, using a novel Ghost-in-the-Machine data-collection paradigm.
- Modelling: To design and train a model of social interaction, using annotated data from the human experiments. This model will estimate the social and task-related goals of human partners by processing visual and auditory inputs, and will generate appropriate responses through physical and linguistic actions.
- Adaptation: To endow the model of social interaction with the ability to learn and adapt to human behaviours, and to handle partial or uncertain information about the state of the world and the mental states of human users.
- Implementation: To implement the model of social interaction on a physical robot platform, which can operate in an environment with multiple, dynamically changing interaction partners.
- Evaluation: To evaluate the implemented robot system through physical and simulated interactions with at least two simultaneous human users in a social, task-oriented setting.
Achieving these objectives required an interdisciplinary approach to research, building on ideas from a number of core fields, including social robotics, social signal processing, machine learning, visual processing, natural language interaction, automated planning and reasoning, and data collection for system design. The architecture for the JAMES system was based on a standard multi-layer structure: the components at the low level dealt with modality-specific, highly detailed information such as spatial coordinates, word lattices, and robot arm trajectories; the mid-level components dealt with more abstract, cross-modality representations of states and events; while the high-level components reasoned about the most abstract structures, such as knowledge and action represented in a logical form.
The specific demonstration of the JAMES research was a bartender scenario, where the robot played the role of a bartender responding to customers' requests in a dynamic setting, with multiple customers and short interactions. Interactions in the target scenario incorporated a mixture of both task-based and social behaviours, each of which presented certain challenges: a robot existing in the physical world must be able to understand and respond to both the social and the task-based needs of the humans that it encounters, and to successfully distinguish them from each other. Thus, while many human-robot interactions may lead to the same task goal, the quality of those interactions is greatly enhanced by getting the "people skills" right.
Two people, A and B, each individually approach the robot bartender.
Robot (to A): How can I help you?
Person A: A pint of cider, please.
Person C approaches and attracts the attention of the robot by gesturing.
Robot (to C): Just a moment please.
Robot: (Serves A)
Robot (to B): What will you have?
Person B: A pint of bitter.
Robot: (Serves B)
Robot (to C): Thanks for waiting. How can I help you?
Person C: I'd like a glass of red wine.
Robot: (Serves C)
At the research level, the bartender scenario presented challenges for the core components of the JAMES system, by giving rise to situations in which multiple humans are simultaneously present and potentially interacting, both with the robot agent and with each other. This scenario also provided a natural setting where the physical capabilities of a robot platform need only be reasonably sufficient in order to achieve the long-term goals of the project. As a result, this enabled the project to focus its research efforts on correctly developing the core social skills the robot required.