Unveiling elemental fingerprints: A comparative study of clustering methods for multi-element nanoparticle data

Sci Total Environ. 2023 Dec 20:905:167176. doi: 10.1016/j.scitotenv.2023.167176. Epub 2023 Sep 18.

Abstract

Single particle-inductively coupled plasma-time of flight-mass spectrometers (SP-ICP-TOF-MS) generates large datasets of the multi-elemental composition of nanoparticles. However, extracting useful information from such datasets is challenging. Hierarchical clustering (HC) has been successfully applied to extract elemental fingerprints from multi-element nanoparticle data obtained by SP-ICP-TOF-MS. However, many other clustering approaches can be applied to analyze SP-ICP-TOF-MS data that have not yet been evaluated. This study fills this knowledge gap by comparing the performance of three clustering approaches: HC, spectral clustering, and t-distributed Stochastic Neighbor Embedding coupled with Density-Based Spatial Clustering of Applications with Noise (tSNE-DBSCAN) for analyzing SP-ICP-TOF-MS data. The performance of these clustering techniques was evaluated by comparing the size of the extracted clusters and the similarity of the elemental composition of nanoparticles within each cluster. Hierarchical clustering often failed to achieve an optimal clustering solution for SP-ICP-TOF-MS data because HC is sensitive to the presence of outliers. Spectral clustering and tSNE-DBSCAN extracted clusters that were not identified by HC. This is because spectral clustering, a method developed based on graph theory, reveals the global and local structure in the data. tSNE reduces and maps the data into a lower-dimensional space, enabling clustering algorithms such as DBSCAN to identify subclusters with subtle differences in their elemental composition. However, tSNE-DBSCAN can lead to unsatisfactory clustering solutions because tuning the perplexity hyperparameter of tSNE is a difficult and a time-consuming task, and the relative distance between datapoints is not maintained. Although the three clustering approaches successfully extract useful information from SP-ICP-TOF-MS data, spectral clustering outperforms HC and tSNE-DBSCAN by generating clusters of a large number of nanoparticles with similar elemental compositions.

Keywords: Engineered nanoparticles; High dimensional data; Mass spectrometry; Multi-element single nanoparticle; Nonlinear clustering; Spectral clustering; tSNE.