Decoding phenotypic screening: A comparative analysis of image representations

Comput Struct Biotechnol J. 2024 Mar 12:23:1181-1188. doi: 10.1016/j.csbj.2024.02.022. eCollection 2024 Dec.

Abstract

Biomedical imaging techniques such as high content screening (HCS) are valuable for drug discovery, but high costs limit their use to pharmaceutical companies. To address this issue, The JUMP-CP consortium released a massive open image dataset of chemical and genetic perturbations, providing a valuable resource for deep learning research. In this work, we aim to utilize the JUMP-CP dataset to develop a universal representation model for HCS data, mainly data generated using U2OS cells and CellPainting protocol, using supervised and self-supervised learning approaches. We propose an evaluation protocol that assesses their performance on mode of action and property prediction tasks using a popular phenotypic screening dataset. Results show that the self-supervised approach that uses data from multiple consortium partners provides representation that is more robust to batch effects whilst simultaneously achieving performance on par with standard approaches. Together with other conclusions, it provides recommendations on the training strategy of a representation model for HCS images.

Keywords: Activity prediction; Deep Learning; High Content Screening; Image representation; Self-supervised learning.