Medical checkup data analysis method based on LiNGAM and its application to nonalcoholic fatty liver disease

Artif Intell Med. 2022 Jun:128:102310. doi: 10.1016/j.artmed.2022.102310. Epub 2022 Apr 22.

Abstract

Although medical checkup data would be useful for identifying unknown factors of disease progression, a causal relationship between checkup items should be taken into account for precise analysis. Missing values in medical checkup data must be appropriately imputed because checkup items vary from person to person, and items that have not been tested include missing values. In addition, the patients with target diseases or disorders are small in comparison with the total number of persons recorded in the data, which means medical checkup data is an imbalanced data analysis. We propose a new method for analyzing the causal relationship in medical checkup data to discover disease progression factors based on a linear non-Gaussian acyclic model (LiNGAM), a machine learning technique for causal inference. In the proposed method, specific regression coefficients calculated through LiNGAM were compared to estimate the causal strength of the checkup items on disease progression, which is referred to as LiNGAM-beta. We also propose an analysis framework consisting of LiNGAM-beta, collaborative filtering (CF), and a sampling approach for causal inference of medical checkup data. CF and the sampling approach are useful for missing value imputation and balancing of the data distribution. We applied the proposed analysis framework to medical checkup data for identifying factors of Nonalcoholic fatty liver disease (NAFLD) development. The checkup items related to metabolic syndrome and age showed high causal effects on NAFLD severity. The level of blood urea nitrogen (BUN) would have a negative effect on NAFLD severity. Snoring frequency, which is associated with obstructive sleep apnea, affected NAFLD severity, particularly in the male group. Sleep duration also affected NAFLD severity in persons over fifty years old. These analysis results are consistent with previous reports about the causes of NAFLD; for example, NAFLD and metabolic syndrome are mutual and bi-directionally related, and BUN has a negative effect on NAFLD progression. Thus, our analysis result is plausible. The proposed analysis framework including LiNGAM-beta can be applied to various medical checkup data and will contribute to discovering unknown disease factors.

Keywords: Causal analysis; Collaborative filtering; Linear non-Gaussian acyclic model; Medical checkup data; Nonalcoholic fatty liver disease.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Analysis
  • Disease Progression
  • Humans
  • Male
  • Metabolic Syndrome* / diagnosis
  • Metabolic Syndrome* / epidemiology
  • Middle Aged
  • Non-alcoholic Fatty Liver Disease* / diagnosis
  • Non-alcoholic Fatty Liver Disease* / epidemiology
  • Normal Distribution