Machine learning method using position-specific mutation based classification outperforms one hot coding for disease severity prediction in haemophilia 'A'

Genomics. 2020 Nov;112(6):5122-5128. doi: 10.1016/j.ygeno.2020.09.020. Epub 2020 Sep 11.

Abstract

Haemophilia is an X-linked genetic disorder in which A and B types are the most common that occur due to absence or lack of protein factors VIII and IX, respectively. Severity of the disease depends on mutation. Available Machine Learning (ML) methods that predict the mutational severity by using traditional encoding approaches, generally have high time complexity and compromised accuracy. In this study, Haemophilia 'A' patient mutation dataset containing 7784 mutations was processed by the proposed Position-Specific Mutation (PSM) and One-Hot Encoding (OHE) technique to predict the disease severity. The dataset processed by PSM and OHE methods was analyzed and trained for classification of mutation severity level using various ML algorithms. Surprisingly, PSM outperformed OHE, both in terms of time efficiency and accuracy, with training and prediction time improvement in the range of approximately 91 to 98% and 80 to 99% respectively. The severity prediction accuracy also improved by using PSM with different ML algorithms.

Keywords: Factor VIII; Haemophilia; Machine learning; Mutation; One-hot encoding; Position specific mutation.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Hemophilia A / diagnosis*
  • Hemophilia A / genetics
  • Humans
  • Machine Learning*
  • Mutation*
  • Severity of Illness Index