Multiset multicover methods for discriminative marker selection

Cell Rep Methods. 2022 Nov 11;2(11):100332. doi: 10.1016/j.crmeth.2022.100332. eCollection 2022 Nov 21.

Abstract

Markers are increasingly being used for several high-throughput data analysis and experimental design tasks. Examples include the use of markers for assigning cell types in scRNA-seq studies, for deconvolving bulk gene expression data, and for selecting marker proteins in single-cell spatial proteomics studies. Most marker selection methods focus on differential expression (DE) analysis. Although such methods work well for data with a few non-overlapping marker sets, they are not appropriate for large atlas-size datasets where several cell types and tissues are considered. To address this, we define the phenotype cover (PC) problem for marker selection and present algorithms that can improve the discriminative power of marker sets. Analysis of these sets on several marker-selection tasks suggests that these methods can lead to solutions that accurately distinguish different phenotypes in the data.

Keywords: algorithm; biomarker; cross-entropy method; gene sets; marker discovery; multiset multicover; phenotype cover; scRNA-seq; set cover.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Gene Expression Profiling* / methods
  • Phenotype
  • Single-Cell Analysis* / methods