AI in predicting COPD in the Canadian population

Biosystems. 2022 Jan:211:104585. doi: 10.1016/j.biosystems.2021.104585. Epub 2021 Dec 2.

Abstract

Chronic obstructive pulmonary disease (COPD) is a progressive lung disease that produces non-reversible airflow limitations. Approximately 10% of Canadians aged 35 years or older are living with COPD. Primary care is often the first contact an individual will have with the healthcare system providing acute care, chronic disease management, and services aimed at health maintenance. This study used Electronic Medical Record (EMR) data from primary care clinics in seven provinces across Canada to develop predictive models to identify COPD in the Canadian population. The comprehensive nature of this primary care EMR data containing structured numeric, categorical, hybrid, and unstructured text data, enables the predictive models to capture symptoms of COPD and discriminate it from diseases with similar symptoms. We applied two supervised machine learning models, a Multilayer Neural Networks (MLNN) model and an Extreme Gradient Boosting (XGB) to identify COPD patients. The XGB model achieved an accuracy of 86% in the test dataset compared to 83% achieved by the MLNN. Utilizing feature importance, we identified a set of key symptoms from the EMR for diagnosing COPD, which included medications, health conditions, risk factors, and patient age. Application of this XGB model to primary care structured EMR data can identify patients with COPD from others having similar chronic conditions for disease surveillance, and improve evidence-based care delivery.

Keywords: Bag of words model; COPD; EMR data; Extreme gradient boosting; Feature importance; Machine learning; Medical diagnosis; Text classification.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Canada / epidemiology
  • Datasets as Topic
  • Electronic Health Records
  • Humans
  • Pulmonary Disease, Chronic Obstructive / diagnosis*
  • Pulmonary Disease, Chronic Obstructive / epidemiology