Combined statistical modeling enables accurate mining of circadian transcription

NAR Genom Bioinform. 2021 Apr 26;3(2):lqab031. doi: 10.1093/nargab/lqab031. eCollection 2021 Jun.

Abstract

Circadian-regulated genes are essential for tissue homeostasis and organismal function, and are therefore common targets of scrutiny. Detection of rhythmic genes using current analytical tools requires exhaustive sampling, a demand that is costly and raises ethical concerns, making it unfeasible in certain mammalian systems. Several non-parametric methods have been commonly used to analyze short-term (24 h) circadian data, such as JTK_cycle and MetaCycle. However, algorithm performance varies greatly depending on various biological and technical factors. Here, we present CircaN, an ad-hoc implementation of a non-linear mixed model for the identification of circadian genes in all types of omics data. Based on the variable but complementary results obtained through several biological and in silico datasets, we propose a combined approach of CircaN and non-parametric models to dramatically improve the number of circadian genes detected, without affecting accuracy. We also introduce an R package to make this approach available to the community.