Interpreting protein abundance in Saccharomyces cerevisiae through relational learning

Daniel Brunnsåker; Filip Kronström; Ievgeniia A Tiukova; Ross D King

doi:10.1093/bioinformatics/btae050

Interpreting protein abundance in Saccharomyces cerevisiae through relational learning

Bioinformatics. 2024 Feb 1;40(2):btae050. doi: 10.1093/bioinformatics/btae050.

Authors

Daniel Brunnsåker¹, Filip Kronström¹, Ievgeniia A Tiukova^{2

3}, Ross D King^{1

4

5}

Affiliations

¹ Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden.
² Department of Life Sciences, Chalmers University of Technology, Gothenburg 412 96, Sweden.
³ Department of Industrial Biotechnology, KTH Royal Institute of Technology, Stockholm 106 91, Sweden.
⁴ Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom.
⁵ The Alan Turing Institute, London NW1 2DB, United Kingdom.

Abstract

Motivation: Proteomic profiles reflect the functional readout of the physiological state of an organism. An increased understanding of what controls and defines protein abundances is of high scientific interest. Saccharomyces cerevisiae is a well-studied model organism, and there is a large amount of structured knowledge on yeast systems biology in databases such as the Saccharomyces Genome Database, and highly curated genome-scale metabolic models like Yeast8. These datasets, the result of decades of experiments, are abundant in information, and adhere to semantically meaningful ontologies.

Results: By representing this knowledge in an expressive Datalog database we generated data descriptors using relational learning that, when combined with supervised machine learning, enables us to predict protein abundances in an explainable manner. We learnt predictive relationships between protein abundances, function and phenotype; such as α-amino acid accumulations and deviations in chronological lifespan. We further demonstrate the power of this methodology on the proteins His4 and Ilv2, connecting qualitative biological concepts to quantified abundances.

Availability and implementation: All data and processing scripts are available at the following Github repository: https://github.com/DanielBrunnsaker/ProtPredict.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Phenotype
Proteomics
Saccharomyces cerevisiae Proteins* / genetics
Saccharomyces cerevisiae* / genetics
Systems Biology / methods

Substances

Saccharomyces cerevisiae Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding