Enhancing genome-wide populus trait prediction through deep convolutional neural networks

Huaichuan Duan; Xiangwei Dai; Quanshan Shi; Yan Cheng; Yutong Ge; Shan Chang; Wei Liu; Feng Wang; Hubing Shi; Jianping Hu

doi:10.1111/tpj.16790

Enhancing genome-wide populus trait prediction through deep convolutional neural networks

Plant J. 2024 May 13. doi: 10.1111/tpj.16790. Online ahead of print.

Authors

Huaichuan Duan^#^{1

2}, Xiangwei Dai^#³, Quanshan Shi^#², Yan Cheng¹, Yutong Ge², Shan Chang³, Wei Liu⁴, Feng Wang^{3

5}, Hubing Shi¹, Jianping Hu²

Affiliations

¹ Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China.
² Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China.
³ School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China.
⁴ School of Life Science, Leshan Normal University, Leshan, China.
⁵ School of Computer Engineering, Suzhou Vocational University, Suzhou, China.

^# Contributed equally.

PMID: 38741374
DOI: 10.1111/tpj.16790

Abstract

As a promising model, genome-based plant breeding has greatly promoted the improvement of agronomic traits. Traditional methods typically adopt linear regression models with clear assumptions, neither obtaining the linkage between phenotype and genotype nor providing good ideas for modification. Nonlinear models are well characterized in capturing complex nonadditive effects, filling this gap under traditional methods. Taking populus as the research object, this paper constructs a deep learning method, DCNGP, which can effectively predict the traits including 65 phenotypes. The method was trained on three datasets, and compared with other four classic models-Bayesian ridge regression (BRR), Elastic Net, support vector regression, and dualCNN. The results show that DCNGP has five typical advantages in performance: strong prediction ability on multiple experimental datasets; the incorporation of batch normalization layers and Early-Stopping technology enhancing the generalization capabilities and prediction stability on test data; learning potent features from the data and thus circumventing the tedious steps of manual production; the introduction of a Gaussian Noise layer enhancing predictive capabilities in the case of inherent uncertainties or perturbations; fewer hyperparameters aiding to reduce tuning time across datasets and improve auto-search efficiency. In this way, DCNGP shows powerful predictive ability from genotype to phenotype, which provide an important theoretical reference for building more robust populus breeding programs.

Keywords: Gaussian Noise layer; convolutional neural network; deep leaning; fewer hyperparameters; populus breeding.

Grants and funding

2022YFD2200100/the National Key Research and Development Program of China