De novo sequencing of proteins by mass spectrometry

Expert Rev Proteomics. 2020 Jul-Aug;17(7-8):595-607. doi: 10.1080/14789450.2020.1831387. Epub 2020 Oct 21.

Abstract

Introduction: Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed.

Areas covered: De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data.

Expert opinion: As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.

Keywords: Algorithms; database search; de novo; mass spectrometry; proteomics; sequence tags; sequencing; tandem MS.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Amino Acid Sequence / genetics
  • Computational Biology
  • Humans
  • Protein Processing, Post-Translational / genetics*
  • Proteins / genetics
  • Proteins / isolation & purification*
  • Proteomics / trends*
  • Sequence Analysis, Protein / trends*
  • Software
  • Tandem Mass Spectrometry

Substances

  • Proteins