Overview

Objectives

The objective of the WG is to create the core of a Bioinformatics Semantic Web populated by a number of sample data sources and applications representative of the use of the Web in Bioinformatics and to demonstrate novel, reasoning-based solutions dealing with the following problems:

Rules for mediation and to formulate complex queries
Consistent integration of Bioinformatics data
Adaptive portals for molecular biologists

Bioinformatics is an ideal field for testing Semantic Web technologies for three reasons: First, Web-based systems and Web databases have been applied very early in Bioinformatics, second the dramatic increase of data produced in the field calls for novel processing methods, third, the high heterogeneity of Bioinformatics data require semantic-based integration methods.

Consider the following scenario: a biologist obtains a novel DNA sequences nothing is known about. He or she wants to run an alignment, but has specific requirements for the alignment. These requirements are captured as rules and constraints, which are taken into account by the online accessible semantic Web enabled sequence comparison service.

The researcher found a number of significantly similar sequences in yeast for which there is gene expression data available. The scientist requests from the semantic Web enabled gene expression database and tool expression data for the relevant genes. He or she defines rules, which capture which expression profiles are interesting, e.g. all genes which are highly expressed at the beginning and end of the experiment are of interest.

The genes are part of a larger process and the researcher is interested in their gene products. A query to SWISSPROT determines these. Do these proteins interact with each other? To answer this question a semantic Web service is queried, which computationally determines protein interactions. A user-defined rule formulating what constitutes a protein domain interaction, is applied on the fly to SCOP, the structural classification of proteins, and PDB, a large protein structure database. The rule-based sequence similarity tool mentioned above is used to determine whether the scientists proteins of interest are similar to any interacting proteins computed from SCOP and the PDB.

Finally, the scientist wishes to relate the protein interaction network to metabolic pathways. As all the tools used refer to the same ontologies and terminology defined through the gene ontology, the researcher can easily investigate a mapping from the interaction network to a relevant metabolic pathway obtained from a semantic-Web enabled pathway server.

During the above information foraging, the scientist constantly used literature databases to read relevant articles. Despite the tremendous growth of 8000 articles a week, our biologist still manages to quickly find the relevant articles as he or she uses an ontology-based search facility, which guides the search, automatically specialising querying, where too many hits are obtained, and generalising, where too few articles can be found.

Rewerse Working group A2: Adding Semantics to the Bioinformatics Web

Sections

Overview

Objectives