Dimitra Alexopoulou, Thomas Wächter, Laura Pickersgill, Cecilia Eyre, Michael Schroeder:
Terminologies for text-mining; an experiment in the lipoprotein metabolism domain.
Abstract
Background
The engineering of ontologies, especially with a view to a text-mining use, is
      still a new research field. There does not yet exist a well-defined
      theory and technology for ontology construction. Many of the ontology
      design steps remain manual and are based on personal experience and
      intuition. However, there exist a few efforts on automatic construction
      of ontologies in the form of extracted lists of terms and relations
      between them.
Results
We share experience acquired during the manual development of a lipoprotein
      metabolism ontology (LMO) to be used for text-mining. We compare the
      manually created ontology terms with the automatically derived
      terminology from four different automatic term recognition (ATR)
      methods. The top 50 predicted terms contain up to 89% relevant
      terms. For the top 1000 terms the best method still generates 51%
      relevant terms. In a corpus of 3066 documents 53% of LMO terms are
      contained and 38% can be generated with one of the methods.
Conclusions
Given high precision, automatic methods can help decrease development time and
      provide significant support for the identification of domain-specific
      vocabulary. The coverage of the domain vocabulary depends strongly on
      the underlying documents. Ontology development for text mining should be
      performed in a semi-automatic way; taking ATR results as input and
      following the guidelines we described.
Availability
The TFIDF term recognition is available as Web Service, described at http://gopubmed4.biotec.tu-dresden.de/IdavollWebService/services/CandidateTermGeneratorService?wsdl
      
URL:
http://rewerse.net/publications/rewerse-publications.html#REWERSE-RP-2008-041
@article{REWERSE-RP-2008-041,
	author = {Dimitra Alexopoulou and Thomas W\"achter and Laura Pickersgill and Cecilia Eyre and Michael Schroeder},
	title = {Terminologies for text-mining; an experiment in the lipoprotein metabolism domain},
	journal = {BMC Bioinformatics},
	year = {2008},
	volume = {9(Suppl 4)},
	number = {S2},
	url = {http://rewerse.net/publications/rewerse-publications.html#REWERSE-RP-2008-041}
}