Advancing forensic research: An examination of compositional data analysis with an application on petrol fraud detection

Sci Justice. 2024 Jan;64(1):9-18. doi: 10.1016/j.scijus.2023.11.003. Epub 2023 Nov 24.

Abstract

In recent years, numerous studies have examined the chemical compounds of petrol and petrol data for forensic research. Standard quantitative methods often assume that the variables or compounds do not have compositional constraints or are not part of a constrained whole, operating within an Euclidean vector space. However, chemical compounds are typically part of a whole, and the appropriate vector space for their analysis is the simplex. Biased and arbitrary results result when statistical analysis are applied on such data without proper pre-processing of such data. Compositional analysis of data has not yet been considered in forensic science. Therefore, we compare classical statistical analysis as applied in forensic research and the new proposed paradigm of compositional data analysis (CoDa). It is demonstrated how such analysis improves the analysis in petrol and forensic science. Our study shows how principal component analysis (PCA) and classification results are affected by the preprocessing steps performed on the raw data. Our results indicate that results from a log ratio analysis provides a better separation between subgroups of the data and leads to an easier interpretation of the results. In addition, with a compositional analysis a higher classification accuracy is obtained. Even a non-linear classification method - in our case a random forest - was shown to perform poorly when applied without using compositional methods. Moreover, normalization of samples due to laboratory/unit-of-measurement effects is no longer necessary, since the composition of an observation is in compositional thinking equivalent to a multiple of it, because the used (log) ratios on raw and log ratio transformed data are equal. Petrol data from different petrol stations in Brazil are used for the demonstration. This data is highly susceptible to counterfeit petrol. Forensic analysis of its chemical elements requires non-biased statistical analysis designed for compositional data to detect fraud. Based on these results, we recommend the use of compositional data methods for gasoline and petrol chemical element analysis and gasoline product characterization, authentication and fraud detection in forensic sciences.

Keywords: Chemical compounds; Classification; Compositional data analysis; Forensic science; Petrol data.