Improving the genome and proteome annotations of the marine model diatom Thalassiosira pseudonana using a proteogenomics strategy

Mar Life Sci Technol. 2023 Feb 3;5(1):102-115. doi: 10.1007/s42995-022-00161-y. eCollection 2023 Feb.

Abstract

Diatoms are unicellular eukaryotic phytoplankton that account for approximately 20% of global carbon fixation and 40% of marine primary productivity; thus, they are essential for global carbon biogeochemical cycling and climate. The availability of ten diatom genome sequences has facilitated evolutionary, biological and ecological research over the past decade; however, a complimentary map of the diatom proteome with direct measurements of proteins and peptides is still lacking. Here, we present a proteome map of the model marine diatom Thalassiosira pseudonana using high-resolution mass spectrometry combined with a proteogenomic strategy. In-depth proteomic profiling of three different growth phases and three nutrient-deficient samples identified 9526 proteins, accounting for ~ 81% of the predicted protein-coding genes. Proteogenomic analysis identified 1235 novel genes, 975 revised genes, 104 splice variants and 234 single amino acid variants. Furthermore, our quantitative proteomic analysis experimentally demonstrated that a considerable number of novel genes were differentially translated under different nutrient conditions. These findings substantially improve the genome annotation of T. pseudonana and provide insights into new biological functions of diatoms. This relatively comprehensive diatom proteome catalog will complement available diatom genome and transcriptome data to advance biological and ecological research of marine diatoms.

Supplementary information: The online version contains supplementary material available at 10.1007/s42995-022-00161-y.

Keywords: Marine diatoms; Proteogenomics; Proteome; Thalassiosira pseudonana.