Predicting potential target genes in molecular biology experiments using machine learning and multifaceted data sources

iScience. 2024 Feb 23;27(3):109309. doi: 10.1016/j.isci.2024.109309. eCollection 2024 Mar 15.

Abstract

Experimental analysis of functionally related genes is key to understanding biological phenomena. The selection of genes to study is a crucial and challenging step, as it requires extensive knowledge of the literature and diverse biomedical data resources. Although software tools that predict relationships between genes are available to accelerate this process, they do not directly incorporate experiment information derived from the literature. Here, we develop LEXAS, a target gene suggestion system for molecular biology experiments. LEXAS is based on machine learning models trained with diverse information sources, including 24 million experiment descriptions extracted from full-text articles in PubMed Central by using a deep-learning-based natural language processing model. By integrating the extracted experiment contexts with biomedical data sources, LEXAS suggests potential target genes for upcoming experiments, complementing existing tools like STRING, FunCoup, and GOSemSim. A simple web interface enables biologists to consider newly derived gene information while planning experiments.

Keywords: Molecular biology; Natural language processing.