Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards

Rafael Lozano; Michael K Freeman; Spencer L James; Benjamin Campbell; Alan D Lopez; Abraham D Flaxman; Christopher Jl Murray; Population Health Metrics Research Consortium (PHMRC)

doi:10.1186/1478-7954-9-50

Performance of InterVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards

Popul Health Metr. 2011 Aug 5:9:50. doi: 10.1186/1478-7954-9-50.

Authors

Rafael Lozano¹, Michael K Freeman, Spencer L James, Benjamin Campbell, Alan D Lopez, Abraham D Flaxman, Christopher Jl Murray; Population Health Metrics Research Consortium (PHMRC)

Affiliation

¹ Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave,, Suite 600, Seattle, WA 98121, USA. rlozano@uw.edu.

Abstract

Background: InterVA is a widely disseminated tool for cause of death attribution using information from verbal autopsies. Several studies have attempted to validate the concordance and accuracy of the tool, but the main limitation of these studies is that they compare cause of death as ascertained through hospital record review or hospital discharge diagnosis with the results of InterVA. This study provides a unique opportunity to assess the performance of InterVA compared to physician-certified verbal autopsies (PCVA) and alternative automated methods for analysis.

Methods: Using clinical diagnostic gold standards to select 12,542 verbal autopsy cases, we assessed the performance of InterVA on both an individual and population level and compared the results to PCVA, conducting analyses separately for adults, children, and neonates. Following the recommendation of Murray et al., we randomly varied the cause composition over 500 test datasets to understand the performance of the tool in different settings. We also contrasted InterVA with an alternative Bayesian method, Simplified Symptom Pattern (SSP), to understand the strengths and weaknesses of the tool.

Results: Across all age groups, InterVA performs worse than PCVA, both on an individual and population level. On an individual level, InterVA achieved a chance-corrected concordance of 24.2% for adults, 24.9% for children, and 6.3% for neonates (excluding free text, considering one cause selection). On a population level, InterVA achieved a cause-specific mortality fraction accuracy of 0.546 for adults, 0.504 for children, and 0.404 for neonates. The comparison to SSP revealed four specific characteristics that lead to superior performance of SSP. Increases in chance-corrected concordance are attained by developing cause-by-cause models (2%), using all items as opposed to only the ones that mapped to InterVA items (7%), assigning probabilities to clusters of symptoms (6%), and using empirical as opposed to expert probabilities (up to 8%).

Conclusions: Given the widespread use of verbal autopsy for understanding the burden of disease and for setting health intervention priorities in areas that lack reliable vital registrations systems, accurate analysis of verbal autopsies is essential. While InterVA is an affordable and available mechanism for assigning causes of death using verbal autopsies, users should be aware of its suboptimal performance relative to other methods.