Terminology
and Language Issues
Terminology Problems
In different models of the same species, parts with
similar names may only denote roughly the same
tissues.
In
GALEN: |
Lobe
of left lung |
|
|
Maps
in FMA to: |
Upper
Lobe of left lung |
|
Lower
lobe of left lung |
More of a problem is that in anatomical models of
different species, parts with the same (or similar)
names do not always denote homologous tissues:
The model species we are comparing have many anatomical parts. For
example, Mouse has 3559 anatomical parts, Drosophila has 506
anatomical parts, and C. Elegans, 242 anatomical parts.
Does
the language used in terminologies and anatomical ontologies
suggest what parts may be similar? Yes. But not directly
- context is crucial. The entire series of names from
root to
leaf node is needed to ground terms that are themselves
underspecified. Grounding refers to connecting natural
language expressions such as mouse tail,
with a model of the world. In this case, the model is a mouse ontology.
Context
is the key to clarifying terms such as "tail".
If you look at the paths for tail in Mouse and C.Elegans,
its clear that they are different:
Path
name in Mouse: |
 -
embryo |
  - tail |
   -
nervous system |
    -
peripheral nervous system |
     -
segmental spinal nerve |
|
Path
name in C.Elegans: |
 -
organ_system |
  -
sex-associated system |
   -
male-associated system |
    -
male tail |
     -
hypodermis |
      -
fan hypodermis |
Language
processing techniques are used to help compare anatomical
part names across species. They include:
a. |
Normalizing
terms to limit the effect of different descriptive
styles. |
b. |
Comparing
content words by removing stop words. |
c. |
Ensuring
comparable forms of words by stemming and lemmatizing. |
d. |
Results
are then treated as an unordered set. |
The
results of the language processing experiments show
that the lexical suggestions
were indicative of structural support in 75% of the
comparisons.
|