Metadata extraction using text mining

Stud Health Technol Inform. 2009:147:95-104.

Abstract

Grid technologies have proven to be very successful in the area of eScience, and healthcare in particular, because they allow to easily combine proven solutions for data querying, integration, and analysis into a secure, scalable framework. In order to integrate the services that implement these solutions into a given Grid architecture, some metadata is required, for example information about the low-level access to these services, security information, and some documentation for the user. In this paper, we investigate how relevant metadata can be extracted from a semi-structured textual documentation of the algorithm that is underlying the service, by the use of text mining methods. In particular, we investigate the semi-automatic conversion of functions of the statistical environment R into Grid services as implemented by the GridR tool by the generation of appropriate metadata.

MeSH terms

  • Algorithms
  • Information Storage and Retrieval / methods*
  • Medical Informatics / organization & administration*