From space to biomedicine: Enabling biomarker data science in the cloud

Cancer Biomark. 2022;33(4):479-488. doi: 10.3233/CBM-210350.

Abstract

NASA's Jet Propulsion Laboratory (JPL) is advancing research capabilities for data science with two of the National Cancer Institute's major research programs, the Early Detection Research Network (EDRN) and the Molecular and Cellular Characterization of Screen-Detected Lesions (MCL), by enabling data-driven discovery for cancer biomarker research. The research team pioneered a national data science ecosystem for cancer biomarker research to capture, process, manage, share, and analyze data across multiple research centers. By collaborating on software and data-driven methods developed for space and earth science research, the biomarker research community is heavily leveraging similar capabilities to support the data and computational demands to analyze research data. This includes linking diverse data from clinical phenotypes to imaging to genomics. The data science infrastructure captures and links data from over 1600 annotations of cancer biomarkers to terabytes of analysis results on the cloud in a biomarker data commons known as "LabCAS". As the data increases in size, it is critical that automated approaches be developed to "plug" laboratories and instruments into a data science infrastructure to systematically capture and analyze data directly. This includes the application of artificial intelligence and machine learning to automate annotation and scale science analysis.

Keywords: Data science; artificial intelligence; big data; cloud computing; data analysis; machine learning.

MeSH terms

  • Artificial Intelligence*
  • Biomarkers, Tumor
  • Data Science*
  • Ecosystem
  • Humans
  • Software

Substances

  • Biomarkers, Tumor