A fast and globally optimal solution for RNA-seq quantification

Brief Bioinform. 2023 Sep 20;24(5):bbad298. doi: 10.1093/bib/bbad298.

Abstract

Alignment-based RNA-seq quantification methods typically involve a time-consuming alignment process prior to estimating transcript abundances. In contrast, alignment-free RNA-seq quantification methods bypass this step, resulting in significant speed improvements. Existing alignment-free methods rely on the Expectation-Maximization (EM) algorithm for estimating transcript abundances. However, EM algorithms only guarantee locally optimal solutions, leaving room for further accuracy improvement by finding a globally optimal solution. In this study, we present TQSLE, the first alignment-free RNA-seq quantification method that provides a globally optimal solution for transcript abundances estimation. TQSLE adopts a two-step approach: first, it constructs a k-mer frequency matrix A for the reference transcriptome and a k-mer frequency vector b for the RNA-seq reads; then, it directly estimates transcript abundances by solving the linear equation ATAx = ATb. We evaluated the performance of TQSLE using simulated and real RNA-seq data sets and observed that, despite comparable speed to other alignment-free methods, TQSLE outperforms them in terms of accuracy. TQSLE is freely available at https://github.com/yhg926/TQSLE.

Keywords: RNA-seq quantification; alignment-free; globally optimal.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Gene Expression Profiling / methods
  • RNA-Seq
  • Sequence Analysis, RNA / methods
  • Software
  • Transcriptome*