Ridhwan Lawal , Wasif Farooq , Abdulazeez Abdulraheem , Abdul Gani Abdul Jameel
{"title":"Predicting soot formation in fossil fuels: A comparative study of regression and machine learning models","authors":"Ridhwan Lawal , Wasif Farooq , Abdulazeez Abdulraheem , Abdul Gani Abdul Jameel","doi":"10.1016/j.dche.2024.100172","DOIUrl":null,"url":null,"abstract":"<div><p>The incomplete combustion of fossil fuels results in the emission of soot, a carbonaceous, solid fine powder that causes harm to human health and the environment. This study compares multiple linear regression (MLR) with three different machine learning (ML) models for predicting the threshold sooting index (TSI), a commonly employed index for measuring the sooting propensity of fuels. The dataset used for model development consists of experimental TSI data for 342 fuels, including various chemical classes, including oxygenated components like ethers and alcohols. Ten input features were employed, comprising eight functionalities, molecular weight, and the branching index (BI). These parameters used as input features have been demonstrated to affect fuels' physical and thermochemical properties. The ML models employed in this study are support vector regression with Nu parameter (NuSVR), extra trees regression (ETR), and extreme gradient boosting regression (XGBR). The models were trained, validated, and tested using randomly split datasets, with 56 % for training, 14 % for validation, and 30 % for testing. The accuracy of the MLR, NuSVR, ETR, and XGBR models for the entire dataset was 91 %, 96 %, 98 %, and 96 %, respectively. The mean absolute errors (MAE) of prediction were 3.4, 0.022, 0.011, and 0.028 for MLR, NuSVR, ETR, and XGBR respectively. These results highlight the effectiveness of the ML models in making predictions, with error levels similar to the uncertainties observed in experimental measurements. The developed ML models have been validated to ensure generalizability and can be used to predict petroleum fuels' TSI.</p></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"12 ","pages":"Article 100172"},"PeriodicalIF":3.0000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772508124000346/pdfft?md5=cc7397098bfb4ba34202a20ec0a0dd60&pid=1-s2.0-S2772508124000346-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Chemical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772508124000346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0
Abstract
The incomplete combustion of fossil fuels results in the emission of soot, a carbonaceous, solid fine powder that causes harm to human health and the environment. This study compares multiple linear regression (MLR) with three different machine learning (ML) models for predicting the threshold sooting index (TSI), a commonly employed index for measuring the sooting propensity of fuels. The dataset used for model development consists of experimental TSI data for 342 fuels, including various chemical classes, including oxygenated components like ethers and alcohols. Ten input features were employed, comprising eight functionalities, molecular weight, and the branching index (BI). These parameters used as input features have been demonstrated to affect fuels' physical and thermochemical properties. The ML models employed in this study are support vector regression with Nu parameter (NuSVR), extra trees regression (ETR), and extreme gradient boosting regression (XGBR). The models were trained, validated, and tested using randomly split datasets, with 56 % for training, 14 % for validation, and 30 % for testing. The accuracy of the MLR, NuSVR, ETR, and XGBR models for the entire dataset was 91 %, 96 %, 98 %, and 96 %, respectively. The mean absolute errors (MAE) of prediction were 3.4, 0.022, 0.011, and 0.028 for MLR, NuSVR, ETR, and XGBR respectively. These results highlight the effectiveness of the ML models in making predictions, with error levels similar to the uncertainties observed in experimental measurements. The developed ML models have been validated to ensure generalizability and can be used to predict petroleum fuels' TSI.