Self-supervised learning for macromolecular structure classification based on cryo-electron tomograms

Front Physiol. 2022 Aug 30:13:957484. doi: 10.3389/fphys.2022.957484. eCollection 2022.

Abstract

Macromolecular structure classification from cryo-electron tomography (cryo-ET) data is important for understanding macro-molecular dynamics. It has a wide range of applications and is essential in enhancing our knowledge of the sub-cellular environment. However, a major limitation has been insufficient labelled cryo-ET data. In this work, we use Contrastive Self-supervised Learning (CSSL) to improve the previous approaches for macromolecular structure classification from cryo-ET data with limited labels. We first pretrain an encoder with unlabelled data using CSSL and then fine-tune the pretrained weights on the downstream classification task. To this end, we design a cryo-ET domain-specific data-augmentation pipeline. The benefit of augmenting cryo-ET datasets is most prominent when the original dataset is limited in size. Overall, extensive experiments performed on real and simulated cryo-ET data in the semi-supervised learning setting demonstrate the effectiveness of our approach in macromolecular labeling and classification.

Keywords: contrastive learning; data augmentation; electron cryo tomograms; macromolecular structure classification; self-supervised learning.