Data Release: 27 March 2017

E2E
NLG Challenge

Entry Submission Deadline: 31 October 2017

Motivation

Natural language generation plays a critical role for Conversational Agents as it has a significant impact on a user’s impression of the system. This shared task focuses on recent end-to-end (E2E), data-driven NLG methods, which jointly learn sentence planning and surface realisation from non-aligned data, e.g. (Wen et al., 2015; Mei et al., 2016; Dusek and Jurcicek, 2016; Lampouras and Vlachos, 2016) etc.

So far, E2E NLG approaches were limited to small, de-lexicalised data sets, e.g. BAGEL, SF Hotels/ Restaurants, or RoboCup. In this shared challenge, we will provide a new crowd-sourced data set of 50k instances in the restaurant domain, as described in (Novikova, Lemon and Rieser, 2016). Each instance consist of a dialogue act-based meaning representation (MR) and up to 5 references in natural language. In contrast to previously used data, our data set includes additional challenges, such as open vocabulary, complex syntactic structures and diverse discourse phenomena. For example:

MR:

name[The Eagle],
eatType[coffee shop],
food[French],
priceRange[moderate],
customerRating[3/5],
area[riverside],
kidsFriendly[yes],
near[Burger King]

NL:

“The three star coffee shop, The Eagle, gives families a mid-priced dining experience featuring a variety of wines and cheeses. Find The Eagle near Burger King.”

The full data set will be released to participants according to the timeline below. A sample of the data can be obtained here.

This challenge follows on from previous successful shared tasks on generation, e.g. SemEval’17 task 9 on text generation from AMR, and Generation Challenges 2008-11. However, this is the first NLG task to concentrate on (1) generation from dialogue acts, (2) using semantically un-aligned data.

Proposed Task

The task is to generate an utterance from a given MR, which is a) similar to human generated reference texts, and b) highly rated by humans. Similarity will be assessed using standard metrics, such as BLEU and METEOR. Human ratings will be obtained using a mixture of crowd-sourcing and expert annotations. We will also test a suite of novel metrics to estimate the quality of a generated utterance.

Download Training Data

Please register here to download the training and development data.

The description of the data is available here.

To cite the dataset and/or challenge, use:

@article{novikova2017e2e,
  title={The {E2E NLG} Shared Task},
  author={Novikova, Jekaterina and Du{\v{s}}ek, Ondrej and Rieser, Verena},
  year={2017}
}

Important Dates

13 March 2017:
Registration opens
27 March 2017:
Training and development data are released (MRs + references)
May 2017:
Baseline is released.
16 October 2017:
Test data is released (MRs only)
31 October 2017:
Entry submission deadline
15 Nov 2017:
Evaluation results are released
15 December 2017:
Participants submit a paper describing their systems
February 2018:
Results presented at workshop

Contacts

Organising Comittee

Jekaterina Novikova
Ondrej Dusek
Verena Rieser

Heriot-Watt University, Edinburgh, UK.

Contact Details

e2e-nlg-challengegooglegroups.com

Advisory Committee

Mohit Bansal, University of Northern Carolina Chapel Hill
Ehud Reiter, University of Aberdeen
Amanda Stent, Bloomberg
Andreas Vlachos, University of Sheffield
Marilyn Walker, University of California Santa Cruz
Matthew Walter, Toyota Technological Institute at Chicago
Tsung-Hsien Wen, University of Cambridge
Luke Zettlemoyer, University of Washington