Normalization of gene/protein names in biological literatures using Vector-Space Model

Annu Int Conf IEEE Eng Med Biol Soc. 2007:2007:390-3. doi: 10.1109/IEMBS.2007.4352306.

Abstract

As the number of biological literatures grows exponentially, needs for text mining system are increased. In text mining area, normalization is mapping gene/protein names to a database. It is necessary to combine extracted information from various literatures and to create a database or an ontology using literatures. Previous normalization researches used direct comparison methods between a database and literatures, but it is weak to extremely variational gene/protein names in literatures. Therefore, in this paper, we propose a normalization method using Vector-Space Model. For each gene/protein name, we rank identifiers using Vector-Space Model, and find the most similar identifier with the name. Experimental result shows the proposed method has 70.7% f-measure.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Abstracting and Indexing
  • Databases, Genetic*
  • Genes*
  • Models, Theoretical*
  • Proteins*
  • Terminology as Topic*

Substances

  • Proteins