HelPredictor models single-cell transcriptome to predict human embryo lineage allocation

Brief Bioinform. 2021 Nov 5;22(6):bbab196. doi: 10.1093/bib/bbab196.

Abstract

The in-depth understanding of cellular fate decision of human preimplantation embryos has prompted investigations on how changes in lineage allocation, which is far from trivial and remains a time-consuming task by experimental methods. It is desirable to develop a novel effective bioinformatics strategy to consider transitions of coordinated embryo lineage allocation and stage-specific patterns. There are rapidly growing applications of machine learning models to interpret complex datasets for identifying candidate development-related factors and lineage-determining molecular events. Here we developed the first machine learning platform, HelPredictor, that integrates three feature selection methods, namely, principal components analysis, F-score algorithm and squared coefficient of variation, and four classical machine learning classifiers that different combinations of methods and classifiers have independent outputs by increment feature selection method. With application to single-cell sequencing data of human embryo, HelPredictor not only achieved 94.9% and 90.9% respectively with cross-validation and independent test, but also fast classified different embryonic lineages and their development trajectories using less HelPredictor-predicted factors. The above-mentioned candidate lineage-specific genes were discussed in detail and were clustered for exploring transitions of embryonic heterogeneity. Our tool can fast and efficiently reveal potential lineage-specific and stage-specific biomarkers and provide insights into how advanced computational tools contribute to development research. The source code is available at https://github.com/liameihao/HelPredictor.

Keywords: cell identity; feature selection; lineage allocation; machine learning; single-cell RNA sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cell Lineage / genetics*
  • Computational Biology / methods*
  • Embryonic Development / genetics*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Developmental
  • Humans
  • Machine Learning
  • Reproducibility of Results
  • Single-Cell Analysis / methods*
  • Software*
  • Transcriptome*
  • Workflow