Computational identification of protein binding sites on RNAs using high-throughput RNA structure-probing data

Xihao Hu; Thomas K F Wong; Zhi John Lu; Ting Fung Chan; Terrence Chi Kong Lau; Siu Ming Yiu; Kevin Y Yip

doi:10.1093/bioinformatics/btt757

Computational identification of protein binding sites on RNAs using high-throughput RNA structure-probing data

Bioinformatics. 2014 Apr 15;30(8):1049-1055. doi: 10.1093/bioinformatics/btt757. Epub 2013 Dec 27.

Authors

Xihao Hu¹, Thomas K F Wong², Zhi John Lu¹, Ting Fung Chan², Terrence Chi Kong Lau¹, Siu Ming Yiu¹, Kevin Y Yip²

Affiliations

¹ Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, CSIRO Ecosystem Sciences, Canberra, ACT 2601, Australia, MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China 100084, School of Life Sciences, Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong and Department of Biology and Chemistry, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong.
² Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, CSIRO Ecosystem Sciences, Canberra, ACT 2601, Australia, MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China 100084, School of Life Sciences, Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong and Department of Biology and Chemistry, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, CSIRO Ecosystem Sciences, Canberra, ACT 2601, Australia, MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China 100084, School of Life Sciences, Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong and Department of Biology and Chemistry, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong.

PMID: 24376038
DOI: 10.1093/bioinformatics/btt757

Abstract

Motivation: High-throughput sequencing has been used to probe RNA structures, by treating RNAs with reagents that preferentially cleave or mark certain nucleotides according to their local structures, followed by sequencing of the resulting fragments. The data produced contain valuable information for studying various RNA properties.

Results: We developed methods for statistically modeling these structure-probing data and extracting structural features from them. We show that the extracted features can be used to predict RNA 'zipcodes' in yeast, regions bound by the She complex in asymmetric localization. The prediction accuracy was better than using raw RNA probing data or sequence features. We further demonstrate the use of the extracted features in identifying binding sites of RNA binding proteins from whole-transcriptome global photoactivatable-ribonucleoside-enhanced cross-linking and immunopurification (gPAR-CLIP) data.

Availability: The source code of our implemented methods is available at http://yiplab.cse.cuhk.edu.hk/probrna/ CONTACT: kevinyip@cse.cuhk.edu.hk Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Binding Sites
Computational Biology*
High-Throughput Nucleotide Sequencing
Nucleic Acid Conformation
Protein Binding*
RNA / chemistry*
RNA-Binding Proteins / metabolism
Saccharomyces cerevisiae / genetics
Sequence Analysis, RNA / methods
Transcriptome

Substances

RNA-Binding Proteins
RNA