A Systematic and General Machine Learning Approach to Build a Consistent Data Set from Different Experiments: Application to the Thermal Conductivity of Methane

IF 5.1 Q2 ENGINEERING, CHEMICAL
Matheus Máximo-Canadas, Julio Cesar Duarte, Jakler Nichele, Leonardo Santos de Brito Alves, Luiz Octavio Vieira Pereira, Rogerio Ramos and Itamar Borges Jr.*, 
{"title":"A Systematic and General Machine Learning Approach to Build a Consistent Data Set from Different Experiments: Application to the Thermal Conductivity of Methane","authors":"Matheus Máximo-Canadas,&nbsp;Julio Cesar Duarte,&nbsp;Jakler Nichele,&nbsp;Leonardo Santos de Brito Alves,&nbsp;Luiz Octavio Vieira Pereira,&nbsp;Rogerio Ramos and Itamar Borges Jr.*,&nbsp;","doi":"10.1021/acsengineeringau.5c0000110.1021/acsengineeringau.5c00001","DOIUrl":null,"url":null,"abstract":"<p >Experimental data from different sources present challenges due to variability and noise from various experimental conditions, apparatuses, and environmental factors. In this work, we propose a general method to address these challenges to build a consistent data set. As a case study, we analyze experimental data sets of methane’s thermal conductivity across the liquid, vapor, and supercritical phases. The method is based on machine learning (ML) techniques, which consistently integrate data from various experimental sources. It feeds raw data compiled by the National Institute of Standards and Technology (NIST) database to different ML algorithms to achieve this purpose. Our findings indicate that ML models yield predictions closer to the NIST’s processed data than to the original raw experimental data used to train the models. This demonstrates the models’ generalization from heterogeneous, noisy, and untreated data sets. While our approach does not eliminate preprocessing, it suggests that ML can autonomously handle noisy data, providing a faster and cost-effective alternative to traditional pre- and postprocessing methods. By guiding the refinement of labor-intensive methods, ML proves adaptable for real-time data, enabling immediate adjustments and revolutionizing industrial and scientific optimizations. Therefore, the proposed ML approach is general and efficient in handling complex and heterogeneous data to deliver reliable predictions without extensive preprocessing.</p>","PeriodicalId":29804,"journal":{"name":"ACS Engineering Au","volume":"5 3","pages":"226–233 226–233"},"PeriodicalIF":5.1000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acsengineeringau.5c00001","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Engineering Au","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsengineeringau.5c00001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Experimental data from different sources present challenges due to variability and noise from various experimental conditions, apparatuses, and environmental factors. In this work, we propose a general method to address these challenges to build a consistent data set. As a case study, we analyze experimental data sets of methane’s thermal conductivity across the liquid, vapor, and supercritical phases. The method is based on machine learning (ML) techniques, which consistently integrate data from various experimental sources. It feeds raw data compiled by the National Institute of Standards and Technology (NIST) database to different ML algorithms to achieve this purpose. Our findings indicate that ML models yield predictions closer to the NIST’s processed data than to the original raw experimental data used to train the models. This demonstrates the models’ generalization from heterogeneous, noisy, and untreated data sets. While our approach does not eliminate preprocessing, it suggests that ML can autonomously handle noisy data, providing a faster and cost-effective alternative to traditional pre- and postprocessing methods. By guiding the refinement of labor-intensive methods, ML proves adaptable for real-time data, enabling immediate adjustments and revolutionizing industrial and scientific optimizations. Therefore, the proposed ML approach is general and efficient in handling complex and heterogeneous data to deliver reliable predictions without extensive preprocessing.

从不同实验中建立一致数据集的系统和通用机器学习方法:应用于甲烷的导热性
来自不同来源的实验数据由于各种实验条件、设备和环境因素的可变性和噪声而面临挑战。在这项工作中,我们提出了一个通用的方法来解决这些挑战,以建立一个一致的数据集。作为一个案例研究,我们分析了甲烷在液体、蒸汽和超临界相的导热性的实验数据集。该方法基于机器学习(ML)技术,该技术始终集成来自各种实验来源的数据。它将美国国家标准与技术研究所(NIST)数据库编译的原始数据提供给不同的机器学习算法来实现这一目的。我们的研究结果表明,机器学习模型产生的预测更接近NIST的处理数据,而不是用于训练模型的原始实验数据。这证明了模型在异构、嘈杂和未经处理的数据集上的泛化。虽然我们的方法并没有消除预处理,但它表明ML可以自主处理噪声数据,为传统的预处理和后处理方法提供了一种更快、更经济的替代方案。通过指导劳动密集型方法的改进,ML证明了对实时数据的适应性,实现了即时调整和革命性的工业和科学优化。因此,所提出的机器学习方法在处理复杂和异构数据方面具有通用性和有效性,无需大量预处理即可提供可靠的预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Engineering Au
ACS Engineering Au 化学工程技术-
自引率
0.00%
发文量
0
期刊介绍: )ACS Engineering Au is an open access journal that reports significant advances in chemical engineering applied chemistry and energy covering fundamentals processes and products. The journal's broad scope includes experimental theoretical mathematical computational chemical and physical research from academic and industrial settings. Short letters comprehensive articles reviews and perspectives are welcome on topics that include:Fundamental research in such areas as thermodynamics transport phenomena (flow mixing mass & heat transfer) chemical reaction kinetics and engineering catalysis separations interfacial phenomena and materialsProcess design development and intensification (e.g. process technologies for chemicals and materials synthesis and design methods process intensification multiphase reactors scale-up systems analysis process control data correlation schemes modeling machine learning Artificial Intelligence)Product research and development involving chemical and engineering aspects (e.g. catalysts plastics elastomers fibers adhesives coatings paper membranes lubricants ceramics aerosols fluidic devices intensified process equipment)Energy and fuels (e.g. pre-treatment processing and utilization of renewable energy resources; processing and utilization of fuels; properties and structure or molecular composition of both raw fuels and refined products; fuel cells hydrogen batteries; photochemical fuel and energy production; decarbonization; electrification; microwave; cavitation)Measurement techniques computational models and data on thermo-physical thermodynamic and transport properties of materials and phase equilibrium behaviorNew methods models and tools (e.g. real-time data analytics multi-scale models physics informed machine learning models machine learning enhanced physics-based models soft sensors high-performance computing)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信