Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach

J Diabetes Sci Technol. 2017 Jul;11(4):791-799. doi: 10.1177/1932296816681584. Epub 2016 Dec 7.

Abstract

Background: Phenotyping is an automated technique that can be used to distinguish patients based on electronic health records. To improve the quality of medical care and advance type 2 diabetes mellitus (T2DM) research, the demand for T2DM phenotyping has been increasing. Some existing phenotyping algorithms are not sufficiently accurate for screening or identifying clinical research subjects.

Objective: We propose a practical phenotyping framework using both expert knowledge and a machine learning approach to develop 2 phenotyping algorithms: one is for screening; the other is for identifying research subjects.

Methods: We employ expert knowledge as rules to exclude obvious control patients and machine learning to increase accuracy for complicated patients. We developed phenotyping algorithms on the basis of our framework and performed binary classification to determine whether a patient has T2DM. To facilitate development of practical phenotyping algorithms, this study introduces new evaluation metrics: area under the precision-sensitivity curve (AUPS) with a high sensitivity and AUPS with a high positive predictive value.

Results: The proposed phenotyping algorithms based on our framework show higher performance than baseline algorithms. Our proposed framework can be used to develop 2 types of phenotyping algorithms depending on the tuning approach: one for screening, the other for identifying research subjects.

Conclusions: We develop a novel phenotyping framework that can be easily implemented on the basis of proper evaluation metrics, which are in accordance with users' objectives. The phenotyping algorithms based on our framework are useful for extraction of T2DM patients in retrospective studies.

Keywords: phenotyping; positive predictive value (PPV); sensitivity; support vector machine (SVM); type 2 diabetes mellitus (T2DM).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Area Under Curve
  • Diabetes Mellitus, Type 2 / diagnosis*
  • Electronic Health Records*
  • Humans
  • Phenotype
  • ROC Curve
  • Sensitivity and Specificity
  • Support Vector Machine*