Biomedical Literature Mining for Repurposing Laboratory Tests

Finn Kuusisto; Ross Kleiman; Jeremy Weiss

doi:10.1007/978-1-0716-2305-3_5

Biomedical Literature Mining for Repurposing Laboratory Tests

Methods Mol Biol. 2022:2496:91-109. doi: 10.1007/978-1-0716-2305-3_5.

Authors

Finn Kuusisto¹, Ross Kleiman², Jeremy Weiss³

Affiliations

¹ Morgridge Institute for Research, Madison, WI, USA. finn@cs.wisc.edu.
² University of Wisconsin, Madison, WI, USA.
³ Carnegie Mellon University, Pittsburgh, PA, USA.

Abstract

Epidemiological studies identifying biological markers of disease state are valuable, but can be time-consuming, expensive, and require extensive intuition and expertise. Furthermore, not all hypothesized markers will be borne out in a study, suggesting that high-quality initial hypotheses are crucial. In this chapter, we describe a high-throughput pipeline to produce a ranked list of high-quality hypothesized biomarkers for diseases. We review an example use of this approach to generate a large number of candidate disease biomarker hypotheses derived from machine learning models, filter and rank them according to their potential novelty using text mining, and corroborate the most promising hypotheses with further statistical modeling. The example use of the pipeline uses a large electronic health record dataset and the PubMed corpus, to find several promising hypothesized laboratory tests with previously undocumented correlations to particular diseases.

Keywords: Biomarker discovery; Electronic health records; Epidemiology; Machine learning; Text mining.

Publication types

Review
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Data Mining*
Electronic Health Records
Machine Learning*
Models, Statistical
Publications

Abstract

Publication types

MeSH terms

Grants and funding