{"title":"Advancing forensic research: An examination of compositional data analysis with an application on petrol fraud detection","authors":"M. Templ , J. Gonzalez-Rodriguez","doi":"10.1016/j.scijus.2023.11.003","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, numerous studies have examined the chemical compounds of petrol and petrol data for forensic research. Standard quantitative methods often assume that the variables or compounds do not have compositional constraints or are not part of a constrained whole, operating within an Euclidean vector space. However, chemical compounds are typically part of a whole, and the appropriate vector space for their analysis is the simplex. Biased and arbitrary results result when statistical analysis are applied on such data without proper pre-processing of such data. Compositional analysis of data has not yet been considered in forensic science. Therefore, we compare classical statistical analysis as applied in forensic research and the new proposed paradigm of compositional data analysis (CoDa). It is demonstrated how such analysis improves the analysis in petrol and forensic science. Our study shows how principal component analysis (PCA) and classification results are affected by the preprocessing steps performed on the raw data.</p><p>Our results indicate that results from a log ratio analysis provides a better separation between subgroups of the data and leads to an easier interpretation of the results. In addition, with a compositional analysis a higher classification accuracy is obtained. Even a non-linear classification method - in our case a random forest - was shown to perform poorly when applied without using compositional methods. Moreover, normalization of samples due to laboratory/unit-of-measurement effects is no longer necessary, since the composition of an observation is in compositional thinking equivalent to a multiple of it, because the used (log) ratios on raw and log ratio transformed data are equal.</p><p>Petrol data from different petrol stations in Brazil are used for the demonstration. This data is highly susceptible to counterfeit petrol. Forensic analysis of its chemical elements requires non-biased statistical analysis designed for compositional data to detect fraud.</p><p>Based on these results, we recommend the use of compositional data methods for gasoline and petrol chemical element analysis and gasoline product characterization, authentication and fraud detection in forensic sciences.</p></div>","PeriodicalId":49565,"journal":{"name":"Science & Justice","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1355030623001223/pdfft?md5=728396c163cfdc0d530930c03d594831&pid=1-s2.0-S1355030623001223-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science & Justice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1355030623001223","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, numerous studies have examined the chemical compounds of petrol and petrol data for forensic research. Standard quantitative methods often assume that the variables or compounds do not have compositional constraints or are not part of a constrained whole, operating within an Euclidean vector space. However, chemical compounds are typically part of a whole, and the appropriate vector space for their analysis is the simplex. Biased and arbitrary results result when statistical analysis are applied on such data without proper pre-processing of such data. Compositional analysis of data has not yet been considered in forensic science. Therefore, we compare classical statistical analysis as applied in forensic research and the new proposed paradigm of compositional data analysis (CoDa). It is demonstrated how such analysis improves the analysis in petrol and forensic science. Our study shows how principal component analysis (PCA) and classification results are affected by the preprocessing steps performed on the raw data.
Our results indicate that results from a log ratio analysis provides a better separation between subgroups of the data and leads to an easier interpretation of the results. In addition, with a compositional analysis a higher classification accuracy is obtained. Even a non-linear classification method - in our case a random forest - was shown to perform poorly when applied without using compositional methods. Moreover, normalization of samples due to laboratory/unit-of-measurement effects is no longer necessary, since the composition of an observation is in compositional thinking equivalent to a multiple of it, because the used (log) ratios on raw and log ratio transformed data are equal.
Petrol data from different petrol stations in Brazil are used for the demonstration. This data is highly susceptible to counterfeit petrol. Forensic analysis of its chemical elements requires non-biased statistical analysis designed for compositional data to detect fraud.
Based on these results, we recommend the use of compositional data methods for gasoline and petrol chemical element analysis and gasoline product characterization, authentication and fraud detection in forensic sciences.
期刊介绍:
Science & Justice provides a forum to promote communication and publication of original articles, reviews and correspondence on subjects that spark debates within the Forensic Science Community and the criminal justice sector. The journal provides a medium whereby all aspects of applying science to legal proceedings can be debated and progressed. Science & Justice is published six times a year, and will be of interest primarily to practising forensic scientists and their colleagues in related fields. It is chiefly concerned with the publication of formal scientific papers, in keeping with its international learned status, but will not accept any article describing experimentation on animals which does not meet strict ethical standards.
Promote communication and informed debate within the Forensic Science Community and the criminal justice sector.
To promote the publication of learned and original research findings from all areas of the forensic sciences and by so doing to advance the profession.
To promote the publication of case based material by way of case reviews.
To promote the publication of conference proceedings which are of interest to the forensic science community.
To provide a medium whereby all aspects of applying science to legal proceedings can be debated and progressed.
To appeal to all those with an interest in the forensic sciences.