Generating reference models for structurally complex data. Application to the stabilometry medical domain

Methods Inf Med. 2013;52(5):441-53. doi: 10.3414/ME12-01-0106. Epub 2013 Sep 6.

Abstract

Objectives: We present a framework specially designed to deal with structurally complex data, where all individuals have the same structure, as is the case in many medical domains. A structurally complex individual may be composed of any type of single-valued or multivalued attributes, including time series, for example. These attributes are structured according to domain-dependent hierarchies. Our aim is to generate reference models of population groups. These models represent the population archetype and are very useful for supporting such important tasks as diagnosis, detecting fraud, analyzing patient evolution, identifying control groups, etc.

Methods: We have developed a conceptual model to represent structurally complex data hierarchically. Additionally, we have devised a method that uses the similarity tree concept to measure how similar two structurally complex individuals are, plus an outlier detection and filtering method. These methods provide the groundwork for the method that we have designed for generating reference models of a set of structurally complex individuals. A key idea of this method is to use event-based analysis for modeling time series.

Results: The proposed framework has been applied to the medical field of stabilometry. To validate the outlier detection method we used 142 individuals, and there was a match between the outlier ratings by the experts and by the system for 139 individuals (97.8%). To validate the reference model generation method, we applied k-fold cross validation (k = 5) with 60 athletes (basketball players and ice-skaters), and the system correctly classified 55 (91.7%). We then added 30 non-athletes as a control group, and the method output the correct result in a very high percentage of cases (96.6%).

Conclusions: We have achieved very satisfactory results for the tests on data from such a complex domain as stabilometry and for the comparison of the reference model generation method with other methods. This supports the validity of this framework.

Keywords: Data mining; outlier detection; reference models; structurally complex data; time series.

Publication types

  • Validation Study

MeSH terms

  • Algorithms
  • Cluster Analysis*
  • Data Mining / methods*
  • Humans
  • Medical Informatics
  • Models, Theoretical*
  • Patients*
  • Reference Standards
  • Reproducibility of Results