Vaida Jakoniene:
Integration of Biological Data.
Abstract
Data integration is an important procedure underlying many
research tasks in the life sciences, as often multiple data sources have
to be accessed to collect the relevant data. The data sources vary in
content, data format, and access methods, which often vastly complicates
the data retrieval process. As a result, the task of retrieving data
requires a great deal of effort and expertise on the part of the user. To
alleviate these difficulties, various information integration systems have
been proposed in the area. However, a number of issues remain unsolved and
new integration solutions are needed.
The work presented in this thesis considers data integration at three
different levels. 1) Integration of biological data sources deals with
integrating multiple data sources from an information integration system
point of view. We study properties of biological data sources and
existing integration systems. Based on the study, we formulate
requirements for systems integrating biological data sources. Then, we
define a query language that supports queries commonly used by
biologists. Also, we propose a high-level architecture for an
information integration system that meets a selected set of requirements
and that supports the specified query language. 2) Integration of
ontologies deals with finding overlapping information between
ontologies. We develop and evaluate algorithms that use life science
literature and take the structure of the ontologies into account. 3)
Grouping of biological data entries deals with organizing data entries
into groups based on the computation of similarity values between the
data entries. We propose a method that covers the main steps and
components involved in similarity-based grouping procedures. The
applicability of the method is illustrated by a number of test
cases. Further, we develop an environment that supports comparison and
evaluation of different grouping strategies.
The work is supported by the implementation of: 1) a prototype for a system
integrating biological data sources, called BioTRIFU, 2) algorithms for
ontology alignment, and 3) an environment for evaluating strategies for
similarity-based grouping of biological data, called KitEGA.
URL:
http://rewerse.net/publications/rewerse-publications.html#REWERSE-RP-2006-137
@phdthesis{REWERSE-RP-2006-137, author = {Vaida Jakoniene}, title = {Integration of Biological Data}, school = {Institute of Computer Science, LMU, Munich}, year = {2006}, note = {PhD Thesis, Linköping University, Department of Computer and Information Science, September 2006}, type = {{Dissertation/Ph.D. thesis}}, url = {http://rewerse.net/publications/rewerse-publications.html#REWERSE-RP-2006-137} }