Linguistic Treebanks and Data-Intensive Parsing
Lecturer(s):Sandra Kübler (Seminar für Sprachwissenschaft, University of Tübingen) and Erhard Hinrichs (Seminar für Sprachwissenschaft, University of Tübingen)
Type:Advanced Course
Section:Language and Computation
Time: 14.00-15.30 (Slot 3)
Room:EM 1.27


In recent years, linguistic treebanks have become an invaluable resource
for broad-coverage, robust parsing of natural language. With increased
availability of treebanks for a wide range of languages, it seems timely
to compare and contrast different annotation schemes and their impact on
parser development and evaluation.

The proposed course will cover central research issues that arise in this

* constituent-based versus dependency-based annotation and evaluation (Lin
1998, Kübler/Telljohann 2002)

* finding optimal data representations: treebank transformations (Johnson
1998, Ule 2003)

* history-based extensions to standard PCFG models (Charniak 2001, Collins
2003, Dubey/Keller 2003)

* combining supervised and unsupervised training (Pereira/Schabes 1992)

* hybrid parsing models (Kübler 2003, Trushkina/Hinrichs 2004)

The course is intended as a follow-up to Helmut Schmid's course
"Statistical methods for natural language processing" held at
ESSLLI 2004. Course lectures will be accompanied by software
demonstrations and practical lab sessions.

