The application of Uniform Manifold Approximation and Projection (UMAP) for unconstrained ordination and classification of biological indicators in aquatic ecology

Sci Total Environ. 2022 Apr 1:815:152365. doi: 10.1016/j.scitotenv.2021.152365. Epub 2021 Dec 25.

Abstract

The analysis of community structure in studies of freshwater ecology often requires the application of dimensionality reduction to process multivariate data. A high number of dimensions (number of taxa/environmental parameters × number of samples), nonlinear relationships, outliers, and high variability usually hinder the visualization and interpretation of multivariate datasets. Here, we proposed a new statistical design using Uniform Manifold Approximation and Projection (UMAP), and community partitioning using Louvain algorithms, to ordinate and classify the structure of aquatic biota in two-dimensional space. We present this approach with a demonstration of five previously published datasets for diatoms, macrophytes, chironomids (larval and subfossil), and fish. Principal Component Analysis (PCA) and Ward's clustering were also used to assess the comparability of the UMAP approach compared to traditional approaches for ordination and classification. The ordination of sampling sites in 2-dimensional space showed a much denser, and easier to interpret, grouping using the UMAP approach in comparison to PCA. The classification of community structure using the Louvain algorithm in UMAP ordinal space showed a high classification strength for data with a high number of dimensions than the cluster patterns obtained with the use of a Ward's algorithm in PCA. Environmental gradients, presented via heat maps, were overlayed with the ordination patterns of aquatic communities, confirming that the ordinations obtained by UMAP were ecologically meaningful. This is the first study that has applied a UMAP approach with classification using Louvain algorithms on ecological datasets. We show that the performance of local and global structures, as well as the number of clusters determined by the algorithm, make this approach more powerful than traditional approaches.

Keywords: Aquatic ecology; Classification; Community structure; Dimensionality reduction; Multivariate approach; Ordination.

MeSH terms

  • Algorithms*
  • Animals
  • Cluster Analysis
  • Environmental Biomarkers*
  • Hydrobiology
  • Principal Component Analysis

Substances

  • Environmental Biomarkers