Association between the Classification of the Genus of Batrachospermaceae (Rhodophyta) and the Environmental Factors Based on Machine Learning

Plants (Basel). 2022 Dec 13;11(24):3485. doi: 10.3390/plants11243485.

Abstract

Batrachospermaceae is the largest family of freshwater red algae, widely distributed around the world, and plays an important role in maintaining the balance of spring and creek ecosystems. The deterioration of the current global ecological environment has also destroyed the habitat of Batrachospermaceae. The research on the environmental factors of Batrachospermaceae and the accurate classification of the genus is necessary for the protection, restoration, excavation, and utilization of Batrachospermaceae resources. In this paper, the database of geographical distribution and environmental factors of Batrachospermaceae was sorted out, and the relationship between the classification of genus and environmental factors in Batrachospermaceae was analyzed based on two machine learning methods, random forest and XGBoost. The result shows: (1) The models constructed by the two machine learning methods can effectively distinguish the genus of Batrachospermaceae based on environmental factors; (2) The overall AUC score of the random forest model for the classification and prediction of the genus of Batrachospermaceae reached 90.41%, and the overall AUC score of the taxonomic prediction of each genus of Batrachospermaceae reached 85.85%; (3) Combining the two methods, it is believed that the environmental factors that affect the distinction of the genus of Batrachospermaceae are mainly altitude, average relative humidity, average temperature, and minimum temperature, among which altitude has the greatest influence. The results can further clarify the taxonomy of the genus in Batrachospermaceae and enrich the research on the differences in environmental factors of Batrachospermaceae.

Keywords: Batrachospermaceae; XGBoost; environmental factors; machine learning; random forest.