Building a Binary Classification Machine-Learning Model: A Guide to Predicting Participation in a Lyme Disease Program at a Medical Institute

Methods Mol Biol. 2024:2742:185-237. doi: 10.1007/978-1-0716-3561-2_15.

Abstract

The field of data analysis, preparation, and machine learning is rapidly expanding, offering numerous libraries and resources for exploration. Researchers gain knowledge through various channels, but few resources provide a comprehensive framework for building machine-learning models. We present a step-by-step framework for constructing a robust Random Forest classification model to fill this gap. Using the trained model, we predict if individuals visiting Sanoviv Medical Institute between 2020 and 2023 participated in the Lyme disease program based on age, symptoms, blood count, and chemistry results. While not exhaustive, the methods in each step provide a valuable starting point for researchers, promoting an understanding of the fundamental approach to model creation. The framework encourages researchers to explore beyond the outlined techniques, fostering innovation and experimentation.

Keywords: Machine-learning Lyme disease; Matplotlib subplot chart panels; Precision and recall curves; Random forest classification; Receiver operating characteristic.

MeSH terms

  • Humans
  • Lyme Disease
  • Machine Learning*
  • Patient Participation*