3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures

Ruben Sanchez-Garcia; Carlos Oscar Sanchez Sorzano; Jose Maria Carazo; Joan Segura

doi:10.3390/molecules22122230

3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures

Molecules. 2017 Dec 15;22(12):2230. doi: 10.3390/molecules22122230.

Authors

Ruben Sanchez-Garcia¹, Carlos Oscar Sanchez Sorzano², Jose Maria Carazo³, Joan Segura⁴

Affiliations

¹ GN7 of the Spanish National Institute for Bioinformatics (INB) and Biocomputing Unit, National Center of Biotechnology (CSIC)/Instruct Image Processing Center, 28049 Madrid, Spain. rsanchez@cnb.csic.es.
² GN7 of the Spanish National Institute for Bioinformatics (INB) and Biocomputing Unit, National Center of Biotechnology (CSIC)/Instruct Image Processing Center, 28049 Madrid, Spain. coss@cnb.csic.es.
³ GN7 of the Spanish National Institute for Bioinformatics (INB) and Biocomputing Unit, National Center of Biotechnology (CSIC)/Instruct Image Processing Center, 28049 Madrid, Spain. carazo@cnb.csic.es.
⁴ GN7 of the Spanish National Institute for Bioinformatics (INB) and Biocomputing Unit, National Center of Biotechnology (CSIC)/Instruct Image Processing Center, 28049 Madrid, Spain. jsegura@cnb.csic.es.

Abstract

Many studies have used position-specific scoring matrices (PSSM) profiles to characterize residues in protein structures and to predict a broad range of protein features. Moreover, PSSM profiles of Protein Data Bank (PDB) entries have been recalculated in many works for different purposes. Although the computational cost of calculating a single PSSM profile is affordable, many statistical studies or machine learning-based methods used thousands of profiles to achieve their goals, thereby leading to a substantial increase of the computational cost. In this work we present a new database compiling PSSM profiles for the proteins of the PDB. Currently, the database contains 333,532 protein chain profiles involving 123,135 different PDB entries.

Keywords: machine learning; position-specific scoring matrices; protein databases; protein structure.

MeSH terms

Databases, Protein*
Position-Specific Scoring Matrices*
Protein Conformation
Proteins / chemistry*
Software

Substances

Proteins