Statistical Assessment of Depth Normalization for Small RNA Sequencing

Li-Xuan Qin; Jian Zou; Jiejun Shi; Ann Lee; Aleksandra Mihailovic; Thalia A Farazi; Thomas Tuschl; Samuel Singer

doi:10.1200/CCI.19.00118

Statistical Assessment of Depth Normalization for Small RNA Sequencing

JCO Clin Cancer Inform. 2020 Jun:4:567-582. doi: 10.1200/CCI.19.00118.

Authors

Li-Xuan Qin¹, Jian Zou¹, Jiejun Shi¹, Ann Lee², Aleksandra Mihailovic³, Thalia A Farazi³, Thomas Tuschl³, Samuel Singer²

Affiliations

¹ Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY.
² Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY.
³ Laboratory of RNA Molecular Biology, The Rockefeller University, New York, NY.

Abstract

Purpose: Methods for depth normalization have been assessed primarily with simulated data or cell-line-mixture data. There is a pressing need for benchmark data enabling a more realistic and objective assessment, especially in the context of small RNA sequencing.

Methods: We collected a unique pair of microRNA sequencing data sets for the same set of tumor samples; one data set was collected with and the other without uniform handling and balanced design. The former provided a benchmark for evaluating evidence of differential expression and the latter served as a test bed for normalization. Next, we developed a data perturbation algorithm to simulate additional data set pairs. Last, we assembled a set of computational tools to visualize and quantify the assessment.

Results: We validated the quality of the benchmark data and showed the need for normalization of the test data. For illustration, we applied the data and tools to assess the performance of 9 existing normalization methods. Among them, trimmed mean of M-values was a better scaling method, whereas the median and the upper quartiles were consistently the worst performers; one variation of remove unwanted variation had the best chance of capturing true positives but at the cost of increased false positives. In general, these methods were, at best, moderately helpful when the level of differential expression was extensive and asymmetric.

Conclusion: Our study (1) provides the much-needed benchmark data and computational tools for assessing depth normalization, (2) shows the dependence of normalization performance on the underlying pattern of differential expression, and (3) calls for continued research efforts to develop more effective normalization methods.

Statistical Assessment of Depth Normalization for Small RNA Sequencing

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding