REWERSE-RP-2008-015

Oliver Fritzen:
Modeling and Querying of Distributed XML Data in Presence of 3rd Party Links.


Complete Text [
.pdf, 1.31MB]
PhD Thesis, Faculty of Mathematics, Goettingen University, January 2008
In: Modeling and Querying of Distributed XML Data in Presence of 3rd Party Links, January 2008
© Goettingen State and University Library

Abstract
XML (short for eXtensible Markup Language) is a meta-language for the representation of digital data. XML has had an enormous impact on modern computer science and IT industry since its advent in 1997, for several reasons: XML is simple and easily accessible. Using Unicode as encoding, XML can be viewed and authored/edited with common text editors, and due to the context-free and well-formed structure of XML document types, it is easy to provide efficient parsers for processing XML documents. Also, XML’s concept of definable document types enables for a structured representation of almost arbitrary digital data, with the document type modeling the domain of the data, which makes XML a very powerful and flexible standard for data representation, particularly regarding the Web. The XLink standard is an extension to XML for defining references between XML documents, inspired by the hyperlink concept from hypertext. XLink defines two types of links: Simple Links are unidirectional links from one document to another, similar to HTML hyperlinks. Extended Links create graph-based relationships (arcs) between portions of XML (resources) over multiple XML documents. Within the LinXIS project, models and query evaluation for XLink have been investigated: in a logical data model, a Simple Link is given the semantics of an embedded view that “imports” the referenced data from a remote document into the link-defining document. The participating XML data, together with the Simple Links define a virtual instance (a single-document view on the distributed data) according to the logical data model. Extended Links define relations between XML resources, but in contrast to Simple Links, they are not defined inside the participating resources but apart of them. This allows to define a semantics for Extended Links, with an Extended Link defining views that combine and extend the participating resources from a 3rd party perspective, without need for write access to them, and thus extending the Simple Links logical data model. The above described logical data model provides a semantics for the evaluation of XPath queries over distributed XML data: A query may be evaluated not on a (physical) XML document, but on the virtual instance defined by the given Simple and Extended Links. The query evaluation may “follow” along a Simple Link, continuing the evaluation process on the referenced, physically remote data. For Extended Links, queries can be evaluated on the integrated view combining the sources referenced by an Extended Link, based on the 3rd party semantics of the link. A previous PhD thesis, which also emerged from the LinXIS project, introduced the data model for Simple Links and investigated techniques and algorithms for XPath query evaluation on the linked XML data. As part of the work, the data model was implemented on base of the Open Source XML database system eXist, thus creating a Simple-Link-enhanced XML database prototype. The present work extends the focus from Simple to Extended Links: The work includes a formal description of both Simple Link and Extended Link semantics, based on a specification as an abstract data type (ADT), and providing Extended Links with a 3rd Party Link semantics. Also, the basic concepts for query evaluation with respect to 3rd Party Links are investigated. The algorithms as well as the logical data model for 3rd Party Links are implemented by further enhancement of the eXist-based prototype, providing the query evaluation unit with that semantics. The prototype is tested within a case study, evaluating the prototype’s functional behavior and performance. The case study is followed by a discussion of the proposed 3rd Party Link approach, addressing its applicability in terms of its design, performance and its relevance within a rapidly evolving Web infrastructure. The work is completed by a conclusion addressing the previously discussed issues, and giving an overview over related research as well as over perspectives and further work.

URL:
http://rewerse.net/publications/rewerse-publications.html#REWERSE-RP-2008-015

BibTeX:

@phdthesis{REWERSE-RP-2008-015,
	author = {Oliver Fritzen},
	title = {Modeling and Querying of Distributed XML Data in Presence of 3rd Party Links},
	school = {Institute of Computer Science, LMU, Munich},
	year = {2008},
	note = {PhD Thesis, Faculty of Mathematics, Goettingen University, January 2008},
	type = {{Dissertation/Ph.D. thesis}},
	url = {http://rewerse.net/publications/rewerse-publications.html#REWERSE-RP-2008-015}
}