José Santos, Pedro Barahona, Ludwig Krippahl:
Mining Protein Structure Data.
Abstract
This paper describes the application of machine learning
algorithms to the discovery of knowledge in a protein structure
database. The problem addressed is the determination of the solvent
exposure of each amino acid residue, using different levels of exposed
surface to define exposure. First we introduce the baseline classifier
which achieves good prediction results despite only taking into account
the amino acid type. Then we explain how we gathered and processed the
data and built our classifier to improve the baseline prediction. Finally
we test and compare several classifiers (e.g. Neural Networks, C5.0, CART
and Chaid), and parameters (level of information per amino acid, SCOP
class of protein, sliding window from the current amino acid) that might
influence the prediction accuracy. We conclude by showing our models
present a modest but statistically significant improvement over the
baseline classifier's accuracy.
URL:
http://rewerse.net/publications/rewerse-publications.html#REWERSE-RP-2007-104
@inproceedings{REWERSE-RP-2007-104, author = {Jos\'{e} Santos and Pedro Barahona and Ludwig Krippahl}, title = {Mining Protein Structure Data}, booktitle = {Proceedings of 13th Portuguese Conference on Artificial Intelligence, Guimarães, Portugal (3rd--7th December 2007)}, year = {2007}, pages = {527--540}, url = {http://rewerse.net/publications/rewerse-publications.html#REWERSE-RP-2007-104} }