Development and Validation of a Machine Learning Model for Automated Assessment of Resident Clinical Reasoning Documentation

Verity Schaye; Benedict Guzman; Jesse Burk-Rafel; Marina Marin; Ilan Reinstein; David Kudlowitz; Louis Miller; Jonathan Chun; Yindalon Aphinyanaphongs

doi:10.1007/s11606-022-07526-0

Development and Validation of a Machine Learning Model for Automated Assessment of Resident Clinical Reasoning Documentation

J Gen Intern Med. 2022 Jul;37(9):2230-2238. doi: 10.1007/s11606-022-07526-0. Epub 2022 Jun 16.

Authors

Verity Schaye^{1

2}, Benedict Guzman³, Jesse Burk-Rafel³, Marina Marin³, Ilan Reinstein³, David Kudlowitz³, Louis Miller⁴, Jonathan Chun⁵, Yindalon Aphinyanaphongs³

Affiliations

¹ NYU Grossman School of Medicine, New York, NY, USA. verity.schaye@nyulangone.org.
² NYC Health & Hospitals/Bellevue, New York, NY, USA. verity.schaye@nyulangone.org.
³ NYU Grossman School of Medicine, New York, NY, USA.
⁴ Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA.
⁵ Stanford University School of Medicine, Stanford, CA, USA.

Abstract

Background: Residents receive infrequent feedback on their clinical reasoning (CR) documentation. While machine learning (ML) and natural language processing (NLP) have been used to assess CR documentation in standardized cases, no studies have described similar use in the clinical environment.

Objective: The authors developed and validated using Kane's framework a ML model for automated assessment of CR documentation quality in residents' admission notes.

Design, participants, main measures: Internal medicine residents' and subspecialty fellows' admission notes at one medical center from July 2014 to March 2020 were extracted from the electronic health record. Using a validated CR documentation rubric, the authors rated 414 notes for the ML development dataset. Notes were truncated to isolate the relevant portion; an NLP software (cTAKES) extracted disease/disorder named entities and human review generated CR terms. The final model had three input variables and classified notes as demonstrating low- or high-quality CR documentation. The ML model was applied to a retrospective dataset (9591 notes) for human validation and data analysis. Reliability between human and ML ratings was assessed on 205 of these notes with Cohen's kappa. CR documentation quality by post-graduate year (PGY) was evaluated by the Mantel-Haenszel test of trend.

Key results: The top-performing logistic regression model had an area under the receiver operating characteristic curve of 0.88, a positive predictive value of 0.68, and an accuracy of 0.79. Cohen's kappa was 0.67. Of the 9591 notes, 31.1% demonstrated high-quality CR documentation; quality increased from 27.0% (PGY1) to 31.0% (PGY2) to 39.0% (PGY3) (p < .001 for trend). Validity evidence was collected in each domain of Kane's framework (scoring, generalization, extrapolation, and implications).

Conclusions: The authors developed and validated a high-performing ML model that classifies CR documentation quality in resident admission notes in the clinical environment-a novel application of ML and NLP with many potential use cases.

Keywords: assessment; clinical reasoning; documentation; machine learning; natural language processing.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Clinical Reasoning*
Documentation*
Electronic Health Records
Humans
Machine Learning
Natural Language Processing
Reproducibility of Results
Retrospective Studies