Cell population identification using fluorescence-minus-one controls with a one-class classifying algorithm

Kristen Feher; Jenny Kirsch; Andreas Radbruch; Hyun-Dong Chang; Toralf Kaiser

doi:10.1093/bioinformatics/btu575

Cell population identification using fluorescence-minus-one controls with a one-class classifying algorithm

Bioinformatics. 2014 Dec 1;30(23):3372-8. doi: 10.1093/bioinformatics/btu575. Epub 2014 Aug 27.

Authors

Kristen Feher¹, Jenny Kirsch¹, Andreas Radbruch¹, Hyun-Dong Chang¹, Toralf Kaiser¹

Affiliation

¹ Deutsches Rheuma-Forschungszentrum, Berlin 10117, Germany.

PMID: 25170025
DOI: 10.1093/bioinformatics/btu575

Abstract

Motivation: The tried and true approach of flow cytometry data analysis is to manually gate on each biomarker separately, which is feasible for a small number of biomarkers, e.g. less than five. However, this rapidly becomes confusing as the number of biomarker increases. Furthermore, multivariate structure is not taken into account. Recently, automated gating algorithms have been implemented, all of which rely on unsupervised learning methodology. However, all unsupervised learning outputs suffer the same difficulties in validation in the absence of external knowledge, regardless of application domain.

Results: We present a new semi-automated algorithm for population discovery that is based on comparison to fluorescence-minus-one controls, thus transferring the problem into that of one-class classification, as opposed to being an unsupervised learning problem. The novel one-class classification algorithm is based on common principal components and can accommodate complex mixtures of multivariate densities. Computational time is short, and the simple nature of the calculations means the algorithm can easily be adapted to process large numbers of cells (10(6)). Furthermore, we are able to find rare cell populations as well as populations with low biomarker concentration, both of which are inherently hard to do in an unsupervised learning context without prior knowledge of the samples' composition.

Availability and implementation: R scripts are available via https://fccf.mpiib-berlin.mpg.de/daten/drfz/bioinformatics/with{username,password}={bioinformatics,Sar=Gac4}.

MeSH terms

Algorithms*
Biomarkers / analysis
Cluster Analysis
Flow Cytometry / methods*
Fluorescence
Humans
Support Vector Machine

Substances

Biomarkers