Dynameomics: design of a computational lab workflow and scientific data repository for protein simulations

Andrew M Simms; Rudesh D Toofanny; Catherine Kehl; Noah C Benson; Valerie Daggett

doi:10.1093/protein/gzn012

Dynameomics: design of a computational lab workflow and scientific data repository for protein simulations

Protein Eng Des Sel. 2008 Jun;21(6):369-77. doi: 10.1093/protein/gzn012. Epub 2008 Apr 14.

Authors

Andrew M Simms¹, Rudesh D Toofanny, Catherine Kehl, Noah C Benson, Valerie Daggett

Affiliation

¹ Biomedical and Health Informatics Program, University of Washington, Seattle, WA 98195-5013, USA.

PMID: 18411223
DOI: 10.1093/protein/gzn012

Abstract

Dynameomics is a project to investigate and catalog the native-state dynamics and thermal unfolding pathways of representatives of all protein folds using solvated molecular dynamics simulations, as described in the preceding paper. Here we introduce the design of the molecular dynamics data warehouse, a scalable, reliable repository that houses simulation data that vastly simplifies management and access. In the succeeding paper, we describe the development of a complementary multidimensional database. A single protein unfolding or native-state simulation can take weeks to months to complete, and produces gigabytes of coordinate and analysis data. Mining information from over 3000 completed simulations is complicated and time-consuming. Even the simplest queries involve writing intricate programs that must be built from low-level file system access primitives and include significant logic to correctly locate and parse data of interest. As a result, programs to answer questions that require data from hundreds of simulations are very difficult to write. Thus, organization and access to simulation data have been major obstacles to the discovery of new knowledge in the Dynameomics project. This repository is used internally and is the foundation of the Dynameomics portal site http://www.dynameomics.org. By organizing simulation data into a scalable, manageable and accessible form, we can begin to address substantial questions that move us closer to solving biomedical and bioengineering problems.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Computer Simulation*
Databases, Protein*
Models, Molecular
Programming Languages
Proteins / chemistry*

Substances

Proteins

Grants and funding

3 T15 LM007442-04S1/LM/NLM NIH HHS/United States