Causal relationship inference for a large-scale cellular network

Bioinformatics. 2010 Aug 15;26(16):2020-8. doi: 10.1093/bioinformatics/btq325. Epub 2010 Jun 16.

Abstract

Motivation: Cellular networks usually consist of numerous chemical species, such as DNA, RNA, proteins and small molecules, etc. Different biological tasks are generally performed by complex interactions of these species. As these interactions can rarely be directly measured, it is widely recognized that causal relationship identification is essential in understanding biological behaviors of a cellular network. Challenging issues here include not only the large number of interactions to be estimated, but also many restrictions on probing signals. The purposes of this study are to incorporate power law in cellular network identification, in order to increase accuracy of causal regulation estimations, especially to reduce false positive errors.

Results: Two identification algorithms are developed that can be efficiently applied to causal regulation identification of a large-scale network from noisy steady-state experiment data. A distinguished feature of these algorithms is that power law has been explicitly incorporated into estimations, which is one important structural property that most large-scale cellular networks approximately have. Under the condition that parameters of the power law are known and measurement errors are Gaussian, a likelihood maximization approach is adopted. The developed estimation algorithms consist of three major steps. At first, angle minimization between subspaces is utilized to identify chemical elements that have direct influences on a prescribed chemical element, under the condition that the number of direct regulations is known. Second, interference coefficients from prescribed chemical elements are estimated through likelihood maximization with respect to measurement errors. Finally, direct regulation numbers are identified through maximizing a lower bound of an overall likelihood function. These methods have been applied to an artificially constructed linear system with 100 elements, a mitogen-activated protein kinase pathway model with 103 chemical elements, some DREAM initiative in silico data and some in vivo data. Compared with the widely adopted total least squares (TLS) method, computation results show that parametric estimation accuracy can be significantly increased and false positive errors can be greatly reduced.

Availability: The Matlab files for the methods are available at http://bioinfo.au.tsinghua.edu.cn/member/ylwang/Matlabfiles_CNI.zip.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Least-Squares Analysis
  • Likelihood Functions
  • MAP Kinase Signaling System
  • Models, Biological*
  • Probability
  • Regression Analysis