A new method for correlation analysis of compositional (environmental) data - a worked example

Sci Total Environ. 2017 Dec 31:607-608:965-971. doi: 10.1016/j.scitotenv.2017.06.063. Epub 2017 Jul 27.

Abstract

Most data in environmental sciences and geochemistry are compositional. Already the unit used to report the data (e.g., μg/l, mg/kg, wt%) implies that the analytical results for each element are not free to vary independently of the other measured variables. This is often neglected in statistical analysis, where a simple log-transformation of the single variables is insufficient to put the data into an acceptable geometry. This is also important for bivariate data analysis and for correlation analysis, for which the data need to be appropriately log-ratio transformed. A new approach based on the isometric log-ratio (ilr) transformation, leading to so-called symmetric coordinates, is presented here. Summarizing the correlations in a heat-map gives a powerful tool for bivariate data analysis. Here an application of the new method using a data set from a regional geochemical mapping project based on soil O and C horizon samples is demonstrated. Differences to 'classical' correlation analysis based on log-transformed data are highlighted. The fact that some expected strong positive correlations appear and remain unchanged even following a log-ratio transformation has probably led to the misconception that the special nature of compositional data can be ignored when working with trace elements. The example dataset is employed to demonstrate that using 'classical' correlation analysis and plotting XY diagrams, scatterplots, based on the original or simply log-transformed data can easily lead to severe misinterpretations of the relationships between elements.

Keywords: CoDa; Compositional data analysis; Correlation; Log-ratio methodology; Scatterplot.