Reliable scaling of position weight matrices for binding strength comparisons between transcription factors

Xiaoyan Ma; Daphne Ezer; Carmen Navarro; Boris Adryan

doi:10.1186/s12859-015-0666-1

Reliable scaling of position weight matrices for binding strength comparisons between transcription factors

BMC Bioinformatics. 2015 Aug 20:16:265. doi: 10.1186/s12859-015-0666-1.

Authors

Xiaoyan Ma^{1

2}, Daphne Ezer^{3

4}, Carmen Navarro^{5

6}, Boris Adryan^{7

8}

Affiliations

¹ Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK. xm227@cam.ac.uk.
² Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK. xm227@cam.ac.uk.
³ Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK. de276@cam.ac.uk.
⁴ Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK. de276@cam.ac.uk.
⁵ Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK. cnluzon@decsai.ugr.es.
⁶ Department of Computer Science and Artificial Intelligence, University of Granada, Periodista Daniel Saucedo Aranda, Granada, Spain. cnluzon@decsai.ugr.es.
⁷ Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK. ba255@cam.ac.uk.
⁸ Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK. ba255@cam.ac.uk.

Abstract

Background: Scoring DNA sequences against Position Weight Matrices (PWMs) is a widely adopted method to identify putative transcription factor binding sites. While common bioinformatics tools produce scores that can reflect the binding strength between a specific transcription factor and the DNA, these scores are not directly comparable between different transcription factors. Other methods, including p-value associated approaches (Touzet H, Varré J-S. Efficient and accurate p-value computation for position weight matrices. Algorithms Mol Biol. 2007;2(1510.1186):1748-7188), provide more rigorous ways to identify potential binding sites, but their results are difficult to interpret in terms of binding energy, which is essential for the modeling of transcription factor binding dynamics and enhancer activities.

Results: Here, we provide two different ways to find the scaling parameter λ that allows us to infer binding energy from a PWM score. The first approach uses a PWM and background genomic sequence as input to estimate λ for a specific transcription factor, which we applied to show that λ distributions for different transcription factor families correspond with their DNA binding properties. Our second method can reliably convert λ between different PWMs of the same transcription factor, which allows us to directly compare PWMs that were generated by different approaches.

Conclusion: These two approaches provide computationally efficient ways to scale PWM scores and estimate the strength of transcription factor binding sites in quantitative studies of binding dynamics. Their results are consistent with each other and previous reports in most of cases.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Animals
Binding Sites
Chromatin Immunoprecipitation
Computational Biology / methods*
DNA / metabolism*
Drosophila melanogaster / genetics
Drosophila melanogaster / metabolism
Humans
Position-Specific Scoring Matrices*
Protein Binding
Saccharomyces cerevisiae / genetics
Saccharomyces cerevisiae / metabolism
Sequence Analysis, DNA / methods*
Transcription Factors / metabolism*
Vertebrates / genetics
Vertebrates / metabolism

Substances

Transcription Factors
DNA