Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics

Ming-Hua Chung; Yuping Wang; Hailin Tang; Wen Zou; John Basinger; Xiaowei Xu; Weida Tong

doi:10.3389/fphar.2015.00081

Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics

Front Pharmacol. 2015 Apr 20:6:81. doi: 10.3389/fphar.2015.00081. eCollection 2015.

Authors

Ming-Hua Chung¹, Yuping Wang², Hailin Tang², Wen Zou², John Basinger², Xiaowei Xu³, Weida Tong²

Affiliations

¹ Department of Mathematical Sciences, University of Arkansas Fayetteville, AR, USA.
² Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration Jefferson, AR, USA.
³ Department of Information Science, University of Arkansas at Little Rock Little Rock, AR, USA.

Abstract

The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past 10 years, probabilistic topic modeling has been recognized as an effective machine learning algorithm to annotate the hidden thematic structure of massive collection of documents. The analogy between text corpus and large-scale genomic data enables the application of text mining tools, like probabilistic topic models, to explore hidden patterns of genomic data and to the extension of altered biological functions. In this paper, we developed a generalized probabilistic topic model to analyze a toxicogenomics dataset that consists of a large number of gene expression data from the rat livers treated with drugs in multiple dose and time-points. We discovered the hidden patterns in gene expression associated with the effect of doses and time-points of treatment. Finally, we illustrated the ability of our model to identify the evidence of potential reduction of animal use.

Keywords: TG-GATEs; author-topic model; bioinformatics; machine learning; probabilistic topic modeling; toxicogenomics.