Similarity-guided graph contrastive learning for lncRNA-disease association prediction

J Mol Biol. 2024 May 18:168609. doi: 10.1016/j.jmb.2024.168609. Online ahead of print.

Abstract

The increasing research evidence indicates that long non-coding RNAs (lncRNAs) play important roles in regulating biological processes and are closely associated with many human diseases. Computational methods have emerged as indispensable tools for identifying associations between long non-coding RNA (lncRNA) and diseases, primarily due to the time-consuming and costly nature of traditional biological experiments. Given the scarcity of verified lncRNA-disease associations, the intensifying focus on deep learning is playing a crucial role in refining the accuracy of predictive models. Moreover, the contrastive learning method exhibits a clear advantage in situations where data is scarce or annotation costs are high. In this paper, we leverage the advantages of graph neural networks and contrastive learning to innovatively propose a similarity-guided graph contrastive learning (SGGCL) model for predicting lncRNA-disease associations. In the SGGCL model, we employ a novel similarity-guided graph data augmentation method to generate high-quality positive and negative sample pairs, addressing the scarcity of verified data. Additionally, we utilize the RWR algorithm and a graph convolutional neural network for contrastive learning, facilitating the capture of global topology and high-level node embeddings. The experimental results on several datasets demonstrate the superior predictive performance and scalability of our method in lncRNA-disease association prediction compared to state-of-the-art methods.

Keywords: association prediction; contrastive learning; graph convolutional networks; lncRNA-disease.