BiDaS: a web-based Monte Carlo BioData Simulator based on sequence/feature characteristics

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W582-6. doi: 10.1093/nar/gkt420. Epub 2013 May 28.

Abstract

BiDaS is a web-application that can generate massive Monte Carlo simulated sequence or numerical feature data sets (e.g. dinucleotide content, composition, transition, distribution properties) based on small user-provided data sets. BiDaS server enables users to analyze their data and generate large amounts of: (i) Simulated DNA/RNA and aminoacid (AA) sequences following practically identical sequence and/or extracted feature distributions with the original data. (ii) Simulated numerical features, presenting identical distributions, while preserving the exact 2D or 3D between-feature correlations observed in the original data sets. The server can project the provided sequences to multidimensional feature spaces based on: (i) 38 DNA/RNA features describing conformational and physicochemical nucleotide sequence features from the B-DNA-VIDEO database, (ii) 122 DNA/RNA features based on conformational and thermodynamic dinucleotide properties from the DiProDB database and (iii) Pseudo-aminoacid composition of the initial sequences. To the best of our knowledge, this is the first available web-server that allows users to generate vast numbers of biological data sets with realistic characteristics, while keeping between-feature associations. These data sets can be used for a wide variety of current biological problems, such as the in-depth study of gene, transcript, peptide and protein groups/families; the creation of large data sets from just a few available members and the strengthening of machine learning classifiers. All simulations use advanced Monte Carlo sampling techniques. The BiDaS web-application is available at http://bioserver-3.bioacademy.gr/Bioserver/BiDaS/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • DNA / chemistry*
  • Internet
  • Monte Carlo Method
  • Nucleic Acid Conformation
  • Proteins / chemistry*
  • RNA / chemistry*
  • Sequence Analysis / methods*
  • Sequence Analysis, DNA / methods
  • Sequence Analysis, Protein / methods
  • Sequence Analysis, RNA / methods
  • Software*

Substances

  • Proteins
  • RNA
  • DNA