Annotation of language resources: XML, TEI, OWL

Lecturer(s):Tomaz Erjavec (Institute Jozef Stefan)
Type:Foundational Course
Section:Language and Computation
Time: 11.00-12.30 (Slot 2)
Room:EM 1.82


The course deals with digital encoding of language, which is
increasingly important due to the growing production and interchange
of annotated language resources, eg. text, lexica, ontologies, and
other of language.

The course introduces three TLAs: XML, TEI and OWL.
We give the rationale behind their development, present the component
parts and related recommendations.

Starting with SGML, we present its evolution into XML, and the key
concepts: Unicode, well-formed vs valid documents, DTDs. Next comes
the presentation of XML-related recommendations: namespaces, XPath and
XSLT, and the various flavors of XML schemas.  TEI is a complex
application of XML used to annotated a wide variety of language
resources; we introduce TEI and illustrate with applications to
annotated multilingual corpora and feature structures.  Finally we
cover the more important Semantic Web related initiatives in encoding
meta-data and semantic features: the Dublin Core, RDF, RDFS, and OWL.


© ESSLLI 2005 Organising Committee 2004-12-08