In silico identification of conserved intercoding sequences in Leishmania genomes: unraveling putative cis-regulatory elements

Mol Biochem Parasitol. 2012 Jun;183(2):140-50. doi: 10.1016/j.molbiopara.2012.02.009. Epub 2012 Feb 25.

Abstract

In silico analyses of Leishmania spp. genome data are a powerful resource to improve the understanding of these pathogens' biology. Trypanosomatids such as Leishmania spp. have their protein-coding genes grouped in long polycistronic units of functionally unrelated genes. The control of gene expression happens by a variety of posttranscriptional mechanisms. The high degree of synteny among Leishmania species is accompanied by highly conserved coding sequences (CDS) and poorly conserved intercoding untranslated sequences. To identify the elements involved in the control of gene expression, we conducted an in silico investigation to find conserved intercoding sequences (CICS) in the genomes of L. major, L. infantum, and L. braziliensis. We used a combination of computational tools, such as Linux-Shell, PERL and R languages, BLAST, MSPcrunch, SSAKE, and Pred-A-Term algorithms to construct a pipeline which was able to: (i) search for conservation in target-regions, (ii) eliminate CICS redundancy and mask repeat elements, (iii) predict the mRNA's extremities, (iv) analyze the distribution of orthologous genes within the generated LeishCICS-clusters, (v) assign GO terms to the LeishCICS-clusters, and (vi) provide statistical support for the gene-enrichment annotation. We associated the LeishCICS-cluster data, generated at the end of the pipeline, with the expression profile of L. donovani genes during promastigote-amastigote differentiation, as previously evaluated by others (GEO accession: GSE21936). A Pearson's correlation coefficient greater than 0.5 was observed for 730 LeishCICS-clusters containing from 2 to 17 genes. The designed computational pipeline is a useful tool and its application identified potential regulatory cis elements and putative regulons in Leishmania.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Computational Biology
  • Conserved Sequence*
  • DNA, Protozoan / genetics*
  • Genome, Protozoan
  • Leishmania braziliensis / genetics*
  • Leishmania infantum / genetics*
  • Leishmania major / genetics*
  • Regulatory Sequences, Nucleic Acid*
  • Sequence Analysis, DNA

Substances

  • DNA, Protozoan