A general framework for developing computable clinical phenotype algorithms

J Am Med Inform Assoc. 2024 May 15:ocae121. doi: 10.1093/jamia/ocae121. Online ahead of print.

Abstract

Objective: Present a general framework providing high-level guidance to developers of computable algorithms for identifying patients with specific clinical conditions (phenotypes) through a variety of approaches, including but not limited to machine learning and natural language processing methods to incorporate rich electronic health record data.

Materials/methods: Drawing on extensive prior phenotyping experiences and insights derived from three algorithm development projects conducted specifically for this purpose, our team with expertise in clinical medicine, statistics, informatics, pharmacoepidemiology, and healthcare data science methods conceptualized stages of development and corresponding sets of principles, strategies, and practical guidelines for improving the algorithm development process.

Results: We propose five stages of algorithm development and corresponding principles, strategies, and guidelines: 1) assessing fitness-for-purpose, 2) creating gold standard data, 3) feature engineering, 4) model development, and 5) model evaluation.

Discussion/conclusion: This framework is intended to provide practical guidance and serve as a basis for future elaboration and extension.