Reports

This page contains a number of scientific and technical reports produced by the project.

Visual perception of human activity

This work is concerned with the development of visual processing components that are able to reliably track the location, gestures, and facial displays of multiple people in a constantly changing scene. Such information is necessary in the bartending domain to properly identify users interacting (or not interacting) with the JAMES robot, and related objects in the scene, in order to react and respond appropriately.

  • D1.1 - Visual Identification and Tracking of Humans [pdf]
    Maria Pateraki, Haris Baltzakis, Panos Trahanias
    Details
    Perception of human multimodal social signals in the JAMES scenario is one the main challenges of the project. For this purpose, the JAMES vision system should be able to solve tasks like detection and tracking of humans and individual human parts as faces and hands, visual speech detection, head pose estimation, hand gestures as well as facial expression recognition. In Task 1.1, appropriate algorithms are to be created in order (a) to detect and track humans in the vicinity of the robot, (b) to detect and track the face and the hands of the respective humans as well as to (c) detect and track objects used in hand actions\gestures of humans in Tasks 1.2 and 1.3. For this purpose, information will be fused from visual sensors. This document reports the work done in Task 1.1. It gives details regarding the individual algorithms and methodologies that have been developed and presents the achieved results.

  • D1.2 - Recognition of Hand Gestures, Facial Expressions, and Conversational States [pdf]
    Maria Pateraki, Haris Baltzakis, Panos Trahanias
    Details
    Perception of human multimodal social signals in the JAMES scenario is one of the main challenges of the project. JAMES visual perception system emphasizes the capabilities which are directly related to the development of the robot's social interaction skills in dynamic scenes, such as (a) tracking of humans and their individual body parts as well as objects, (b) extracting pose related information from body and face and (c) recognizing communicative (waiving, pointing) and manipulative (grab/holding an object) gestures. These capabilities are in agreement with the results derived from WP8 and which refer to the nonverbal behaviour that humans utilize to order a drink and define clear visual cues to be integrated in the visual perception system. This document reports the work done in Tasks 1.2 and 1.3. It gives details regarding the individual algorithms and methodologies that have been developed and presents the achieved results.

Natural language communication

This work is concerned with developing components for recognising, understanding, and generating embodied natural language. In particular, the main objectives of this work include developing a speech-recognition system for use in the JAMES environment, developing a natural-language grammar that can be used for both input understanding and output generation, and implementing a multimodal presentation planner capable of controlling the robot hardware.

  • D2.1 - Task-Based Linguistic Interaction [pdf]
    Amy Isard
    Details
    The general objective of WP2 is to develop components for recognising, understanding, and generating embodied natural language. In this report we describe the natural language processing components included in the initial integrated system which can recognize and generate simple task-based utterances.

  • D2.2 - Socially-Enabled Linguistic Interaction [pdf]
    Amy Isard
    Details
    The general objective of WP2 is to develop components for recognising, understanding, and generating embodied natural language. In this report we describe the natural language processing components included in the final integrated system, which can understand and generate a wide range of utterances and multimodal social behaviours.

Social state processing

This work focuses on the mid-level input processing components of the system, multimodal fusion and social state estimation, and the core research challenge in this area: the automatic detection of social signals based on low-level sensor data. The resulting system classifies both task-based and social intentions, and also identifies instances of non-communicative actions.

  • D3.1 - Multimodal Fusion and Basic Social State Estimation [pdf]
    Mary Ellen Foster, Zhuoran Wang, Oliver Lemon
    Details
    This deliverable reports on two distinct strands of work within WP3. We begin by describing the multimodal fusion and rule-based social state recognition components that were developed during the first phase of the project and that were included in the first-year JAMES integrated system. We then describe the development of the initial data-driven social state recognition system, which uses supervised learning based on annotated videos and logs of users interacting with the first-year system.

  • D3.2 - Multimodal Social State Processing [pdf]
    Mary Ellen Foster, Simon Keizer, Oliver Lemon, Zhuoran Wang
    Details
    This deliverable provides a full report of the representations and techniques employed in the final JAMES social state processing system, which supports a wide range of social signals based on the enhanced messages provided by the final vision and linguistic components (D1.2 and D2.2) and makes use of data from the human-human studies in D8.1-D8.3 and the human-robot studies in D7.1-D7.2. It also includes an indication of the results from the system evaluations that are specific to the task of social state processing.

Planning and reasoning

The work addresses the problem of high-level planning and reasoning. Since reasoning and planning are essential for an intelligent agent acting in a dynamic and incompletely known world, achieving goals under such conditions often requires complex forward deliberation that cannot easily be achieved by simply reacting to a situation without considering the long term consequences of a course of action. Action selection is carried out by a knowledge-level planner which reasons about the agent's knowledge and how that knowledge changes due to action (physical robot actions or linguistic speech acts).

  • D4.1 - Specification of High-Level Representations [pdf]
    Ron Petrick, Mary Ellen Foster
    Details
    This deliverable describes the initial representations used to integrate the main technical components in work packages WP3 and WP4 in the initial JAMES system. This report includes a definition of the states produced by the social state recogniser and used as input by the high-level planner, and the plans generated by the planner to be processed by the system's output planner. A description of the high-level planning domain (i.e., state properties, objects, and actions) for the initial JAMES system is also provided.

  • D4.2 - Initial Extensions for Knowledge-Level Planning and Heuristic Search [pdf]
    Ron Petrick
    Details
    This deliverable reports on the high-level planning and reasoning components included in the initial integrated system (D7.1), including the underlying PKS planner and an associated plan execution monitor. These components make use of the initial representations developed in D4.1, process state messages provided by the WP3 social state recogniser (D3.1), and provide plans understandable by the output planner (D2.1). This deliverable also describes the state of proposed heuristic search extensions for PKS.

  • D4.3 - Knowledge-Level Planning and Reasoning in Social State Spaces [pdf]
    Ron Petrick
    Details
    This deliverable reports on developments to the planning and reasoning components previously described in deliverables D4.1 and D4.2, to provide knowledge-level planning and reasoning capabilities for the social states supported by the final version of the JAMES robot bartender system. These components make use of PKS's extended search mechanisms, together with a simple mechanism for reducing probabilistic social state information to non-probabilistic disjunctive knowledge usable by the planner, applied to domain models developed from the results of human-human studies on communicative intent (D8.2 and D8.3) and the learnt social state models (D3.1 and D3.2).

Machine learning for social skills execution

This work is focused on applying machine learning to the task of selecting appropriate social behaviour for the robot, by building on techniques that have been applied successfully to spoken dialogue systems and adapting them to this new context.

  • D5.1 - Initial Social Skills Learning Component and Simulation Environment [pdf]
    Simon Keizer, Oliver Lemon
    Details
    This deliverable describes the initial social skills execution and learning component, along with the simulation environment that is used for testing and training. The social skills executor (SSE) generates output actions for the system, including both communicative and non-communicative actions. The SSE is modelled as a hierarchy of Markov Decision Processes (MDPs) with policies that can be trained simultaneously using Hierarchical Reinforcement Learning. The simulation environment consists of multiple simulated users that enter the scene, try to get the system's attention, and order a drink. The SSE policies are optimised in interaction with this simulation environment, making use of the reward signals provided by the simulated users.

Social robotics and embodiment

This work is centred on the physical robot platform, and addresses problems related to interaction and communication with humans in a socially appropriate manner. A crucial issue in this area is the concept of embodiment and the idea that a robot necessarily exists in the physical work and is able to carry out physical tasks on its own or in collaboration with human partners. This work addresses a number of problems in human-robot interaction, with particular emphasis on the interaction context, the number of interaction partners, and the range of social behaviours supported.

  • D6.1 - Initial Robotics Components and Simulation Environment [pdf]
    Manuel Giuliani, Andre Gaschler, Markus Rickert
    Details
    This deliverable reports on the architecture and hardware components of the JAMES human-robot interaction system after the first project year. It contains an overview of the software that were used to program the robotics components and the communication between software modules. Additionally, it reports on the JAMES robot simulation environment and documents the usage of the simulator.

  • D6.2 - Embodiment for Social Interaction [pdf]
    Manuel Giuliani, Andre Gaschler, Sören Jentzsch
    Details
    This deliverable reports on the complete setup of the two robots that were used in the JAMES project. This includes a description of the robot hardware and software, as well as a description of the robots' manipulation skills. The deliverable also contains a summary of the experiments conducted to measure the naturalness of the robot movements. Finally, the report contains an overview for 3 attached papers, which give more details about the work in robot path planning that has been carried out.

System integration and evaluation

This work focuses on the coordination of project-wide integration activities for implementation on the JAMES robot platform, and for carrying out system evaluations. This activities include: building a technical infrastructure that allows all components to communicate, coordinating the development and integration of the overall system, providing technical support for data-collection studies, supporting interim formative evaluations of the system components and the overall demonstrator system, and carrying out full user evaluations of the final implemented human-robot system with users.

  • D7.1 - First Integrated System: Prototype and Evaluation [pdf]
    Mary Ellen Foster, Andre Gaschler, Manuel Giuliani, Amy Isard, Maria Pateraki, Ron Petrick
    Details
    This deliverable describes the initial integrated JAMES system, which combines initial components from all of the technical work packages (WPs 1-6). The system is able to support simple, primarily task-based interactions in the JAMES bartending scenario. The deliverable consists of a clear specification of the interactions that the system supports, along with a video demonstrating the system functionality. It also presents the results of a user evaluation testing the basic functionality of all components, along with the ability of the system to engage in simple interactions in the JAMES scenario.

  • D7.2 - Second Integrated System: Prototype and Evaluation [pdf]
    Mary Ellen Foster, Andre Gaschler, Manuel Giuliani, Amy Isard, Simon Keizer, Maria Pateraki, Ron Petrick, Markos Sigalas
    Details
    This deliverable corresponds to the intermediate integrated JAMES system, which combines enhanced components from all of the technical work packages. In addition to supporting the simple, primarily task-based interactions from D7.1, this enhanced system is also able to recognise and generate social signals similar to those exhibited in the initial human-human data-collection studies (D8.1). The deliverable consists of a clear specification of the scenario, architecture and components of the updated system, along with the design of a pair of user studies testing the performance of all system components.

  • D7.3 - Final Integrated System: Prototype and Evaluation [pdf]
    Mary Ellen Foster, Andre Gaschler, Manuel Giuliani, Amy Isard, Simon Keizer, Maria Pateraki, Ron Petrick, Markos Sigalas, Zhuoran Wang
    Details
    This deliverable corresponds to the final integrated JAMES system, which combines the final, fully-functional components from all of the technical work packages. This final system supports a wide range of social behaviours based on those observed in the data-collection studies (D8.2 and D8.3). The deliverable consists of a clear specification of the interactions that the system supports. It also presents the results of several user evaluations testing the full functionality of the system, focusing particularly on the enhanced social-behaviour components added to the system compared to the interim system evaluated in D7.2.

  • D7.4 - Extended System Evaluation: Uncertain Conditions [pdf]
    Mary Ellen Foster, Andre Gaschler, Manuel Giuliani, Amy Isard, Simon Keizer, Maria Pateraki, Ron Petrick, Markos Sigalas, Zhuoran Wang
    Details
    This deliverable focuses on the modelling and exploitation of uncertainty in the complete JAMES system, with a focus on testing the extended representations of uncertainty developed across all core system components in WP1-WP6, following the experiments reported in Deliverable D7.3. We summarise how uncertainty is represented and used throughout the system, and then report on experiments addressing the addition of uncertainty.

Multimodal data collection

This work focuses on the collection and analysis of high-quality, clearly annotated, natural, multimodal data to train the project's learning models and inform the implementation of the embodied robot system. Data is gathered using a novel Ghost-in-the-Machine data-collection paradigm, in which a participant plays the role of the artificial agent, making use of only the input and output channels that are supported in the system.

  • D8.2 - Intention-Recognition Study [pdf]
    Sebastian Loth, Kerstin Huth, Jan de Ruiter
    Details
    This deliverable reports experimental findings on how humans recognised the intention of customers in a lab-setting. The experiments used pictures and videos from the corpus presented in Deliverable D8.1. In a picture classification task participants indicated whether customers had the intention to order. In a second experiment, the temporal agreement of participants when identifying the intention to order was measured using video stimuli. In case of unexpected responses, participants explained how they interpreted the behaviour of a customer. Results showed that two signals are necessary and sufficient for identifying an order: being directly at the bar and looking to the bar(tender). Response times indicated that participants checked the area at bar first and assessed looking direction afterwards.

  • D8.3 - Ghost-in-the-Machine Study [pdf]
    Sebastian Loth, Kerstin Huth, Jan de Ruiter
    Details
    This deliverable reports on the newly developed Ghost-in-the-Machine paradigm. The design, data processing and presentation on screen are described. An initial experiment in Bielefeld is reported. The results include the eye tracking data of the participants inspecting the visual display, an analysis of required signals and thresholds. The responses that participants selected from the robot's repertoire are summarised and analysed. Guidelines and strategies for the robotic system are derived from the experimental data.

  • D8.4 - Ghost-in-the-Machine Study [pdf]
    Sebastian Loth, Katharina Jettka, Jan de Ruiter, Manuel Giuliani
    Details
    This deliverable reports on advancing the Ghost-in-the-Machine paradigm for investigating real-time interactions with the JAMES robot at FORTISS. We summarise the difficulties in developing the design in the first study, and the improvements to a real-time, real-life research tool in the second study. The results showed that participants used early, uncertain hypotheses of estimating the customer’s body position and the automatic speech recognition. Furthermore, we summarise the robot actions that were selected for creating a credible social interaction.