Multifactorial analysis of fluorescence detection for soil total petroleum hydrocarbons using random forest and multiple linear regression

IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS
Gaoyong Shi , Ruifang Yang , Nanjing Zhao , Gaofang Yin , Wenqing Liu
{"title":"Multifactorial analysis of fluorescence detection for soil total petroleum hydrocarbons using random forest and multiple linear regression","authors":"Gaoyong Shi ,&nbsp;Ruifang Yang ,&nbsp;Nanjing Zhao ,&nbsp;Gaofang Yin ,&nbsp;Wenqing Liu","doi":"10.1016/j.chemolab.2025.105444","DOIUrl":null,"url":null,"abstract":"<div><div>This study combined random forest (RF) and multiple linear regression (MLR) approaches to analyze the influence of various factors on the fluorescence detection of total petroleum hydrocarbons (TPH) in soil. We considered the effects of soil moisture, organic matter, and minerals, and tested samples of three common soil types and varying concentrations of soil petroleum hydrocarbons using a self-developed fluorescence imaging technology. The fluorescence signals are greatly influenced by moisture, organic matter, and minerals, exhibiting distinct effects depending on the soil types and hydrocarbon concentrations. The RF model improves accuracy and consistency by constructing decision trees, making it appropriate for non-linear and high-dimensional data scenarios, although its underperformance in our study. The MLR model provides a comprehensive understanding of the linear relationships between variables, displaying better statistical performance and consistency in most cases of our experiment, with a coefficient of determination (R<sup>2</sup>) above 0.8, and Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) all lower than those of the RF. Our research provides an important scientific basis for monitoring, evaluating, and managing soil petroleum hydrocarbon pollution, aiding in the formulation of effective soil pollution prevention strategies, and offers a foundation for further research into environmental risk assessment and soil remediation.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"264 ","pages":"Article 105444"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925001297","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This study combined random forest (RF) and multiple linear regression (MLR) approaches to analyze the influence of various factors on the fluorescence detection of total petroleum hydrocarbons (TPH) in soil. We considered the effects of soil moisture, organic matter, and minerals, and tested samples of three common soil types and varying concentrations of soil petroleum hydrocarbons using a self-developed fluorescence imaging technology. The fluorescence signals are greatly influenced by moisture, organic matter, and minerals, exhibiting distinct effects depending on the soil types and hydrocarbon concentrations. The RF model improves accuracy and consistency by constructing decision trees, making it appropriate for non-linear and high-dimensional data scenarios, although its underperformance in our study. The MLR model provides a comprehensive understanding of the linear relationships between variables, displaying better statistical performance and consistency in most cases of our experiment, with a coefficient of determination (R2) above 0.8, and Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) all lower than those of the RF. Our research provides an important scientific basis for monitoring, evaluating, and managing soil petroleum hydrocarbon pollution, aiding in the formulation of effective soil pollution prevention strategies, and offers a foundation for further research into environmental risk assessment and soil remediation.
基于随机森林和多元线性回归的土壤总石油烃荧光检测多因素分析
本研究结合随机森林(RF)和多元线性回归(MLR)方法,分析了各种因素对土壤中总石油烃(TPH)荧光检测的影响。我们考虑了土壤水分、有机物和矿物质的影响,并使用自主开发的荧光成像技术测试了三种常见土壤类型和不同浓度土壤石油碳氢化合物的样品。荧光信号受湿度、有机物和矿物质的影响很大,根据土壤类型和碳氢化合物浓度表现出不同的影响。RF模型通过构建决策树来提高准确性和一致性,使其适用于非线性和高维数据场景,尽管在我们的研究中表现不佳。MLR模型全面理解了变量之间的线性关系,在大多数实验中表现出更好的统计性能和一致性,决定系数(R2)在0.8以上,平均绝对误差(MAE)、均方误差(MSE)和均方根误差(RMSE)均低于RF模型。本研究为土壤石油烃污染的监测、评价和管理提供了重要的科学依据,有助于制定有效的土壤污染防治策略,并为进一步开展环境风险评价和土壤修复研究奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
7.70%
发文量
169
审稿时长
3.4 months
期刊介绍: Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信