Development of a Semiautomated Database for Patients With Adult Congenital Heart Disease

Can J Cardiol. 2022 Oct;38(10):1634-1640. doi: 10.1016/j.cjca.2022.05.022. Epub 2022 May 31.

Abstract

Background: Databases for Congenital Heart Disease (CHD) are effective in delivering accessible datasets ready for statistical inference. Data collection hitherto has, however, been labour and time intensive and has required substantial financial support to ensure sustainability. We propose here creation and piloting of a semiautomated technique for data extraction from clinic letters to populate a clinical database.

Methods: PDF formatted clinic letters stored in a local folder, through a series of algorithms, underwent data extraction, preprocessing, and analysis. Specific patient information (diagnoses, diagnostic complexity, interventions, arrhythmia, medications, and demographic data) was processed into text files and structured data tables, used to populate a database. A specific data validation schema was predefined to verify and accommodate the information populating the database. Unsupervised learning in the form of a dimensionality reduction technique was used to project data into 2 dimensions and visualize their intrinsic structure in relation to the diagnosis, medication, intervention, and European Society of Cardiology classification lists of disease complexity. Ninety-three randomly selected letters were reviewed manually for accuracy.

Results: There were 1409 consecutive outpatient clinic letters used to populate the Scottish Adult Congenital Cardiac Database. Mean patient age was 35.4 years; 47.6% female; with 698 (49.5%) having moderately complex, 369 (26.1%) greatly complex, and 284 (20.1%) mildly complex lesions. Individual diagnoses were successfully extracted in 96.95%, and demographic data were extracted in 100% of letters. Data extraction, database upload, data analysis and visualization took 571 seconds (9.51 minutes). Manual data extraction in the categories of diagnoses, intervention, and medications yielded accuracy of the computer algorithm in 94%, 93%, and 93%, respectively.

Conclusions: Semiautomated data extraction from clinic letters into a database can be achieved successfully with a high degree of accuracy and efficiency.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Algorithms
  • Cardiology*
  • Data Collection
  • Databases, Factual
  • Female
  • Heart Defects, Congenital* / diagnosis
  • Heart Defects, Congenital* / therapy
  • Humans
  • Male