The impact of combining data sets of fluorescence excitation - emission matrices of dissolved organic matter from various aquatic sources on the information retrieved by PARAFAC modeling

Spectrochim Acta A Mol Biomol Spectrosc. 2021 Sep 5:258:119800. doi: 10.1016/j.saa.2021.119800. Epub 2021 Apr 9.

Abstract

Despite that fluorescence spectroscopy coupled with Parallel Factor Analysis (PARAFAC) has been widely used in the investigation of Fluorescent Dissolved Organic Matter (FDOM) in aquatic systems, the proper performance of PARAFAC analysis on datasets originating from various sources is not to be taken for granted. In this study, we examine the impact of the co-analysis of datasets from various natural water systems located in the same geographical region in the Eastern Mediterranean Sea. For this purpose three datasets were formed representative of open sea waters (SW), rivers and streams (RV) and lagoons (LG). The Excitation Emission Matrices (EEMs) derived from fluorescence analysis were subjected to individual PARAFAC analysis per dataset as well as combined analyses i.e.: SWRV, SWLG, RVLG, ALL (SW-RV-LG). We evaluated the reliability of the components that were validated in the combined models through the investigation of model's residuals and components correlation. We also assessed the similarity of the common identified components among models in regards of: (a) spectral position, by calculating the Tucker congruence coefficient (TCC) of the excitation and emission loadings of the PARAFAC components, and (b) fluorescence intensity, through regression analysis of Fmax, among models. Our analysis showed that for natural waters of various sources within the same geographical region, combined PARAFAC modeling can have both negative as well as positive impact. In the case of the combined SWLG and RVLG models, the PARAFAC analysis was able to resolve the fulvic component that was initially observed only in the LG dataset and thus a new component for SW and RV datasets was resolved. The fulvic-like component was actually identified for the first time in the open sea using the combined datasets. Moreover in the combined SWRV analysis tyrosine-like component was resolved which was found initially only in the RV dataset. Contrary, tyrosine-like component was lost in the combined RVLG dataset. We also show that the resolution of extra components in a combined analysis is not always a good fit for the dataset and the model should be assessed in terms of residuals prior acceptance. Finally, our study proposes that the similarity of the common components between combined and individual models is largely dependent on the similarity between the components of the individual models and that the estimation of the Fmax of a component is probably less affected by data diversity compared to the estimation of its spectral position.

Keywords: EEMs; Eastern Mediterranean; Fluorescence spectroscopy; Fulvic; PARAFAC.