An Improved Machine Learning Model for Pure Component Property Estimation

IF 10.1 1区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Xinyu Cao , Ming Gong , Anjan Tula , Xi Chen , Rafiqul Gani , Venkat Venkatasubramanian
{"title":"An Improved Machine Learning Model for Pure Component Property Estimation","authors":"Xinyu Cao ,&nbsp;Ming Gong ,&nbsp;Anjan Tula ,&nbsp;Xi Chen ,&nbsp;Rafiqul Gani ,&nbsp;Venkat Venkatasubramanian","doi":"10.1016/j.eng.2023.08.024","DOIUrl":null,"url":null,"abstract":"<div><p>Information on the physicochemical properties of chemical species is an important prerequisite when performing tasks such as process design and product design. However, the lack of extensive data and high experimental costs hinder the development of prediction techniques for these properties. Moreover, accuracy and predictive capabilities still limit the scope and applicability of most property estimation methods. This paper proposes a new Gaussian process-based modeling framework that aims to manage a discrete and high-dimensional input space related to molecular structure representation with the group-contribution approach. A warping function is used to map discrete input into a continuous domain in order to adjust the correlation between different compounds. Prior selection techniques, including prior elicitation and prior predictive checking, are also applied during the building procedure to provide the model with more information from previous research findings. The framework is assessed using datasets of varying sizes for 20 pure component properties. For 18 out of the 20 pure component properties, the new models are found to give improved accuracy and predictive power in comparison with other published models, with and without machine learning.</p></div>","PeriodicalId":11783,"journal":{"name":"Engineering","volume":"39 ","pages":"Pages 61-73"},"PeriodicalIF":10.1000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2095809924001590/pdfft?md5=1467de2f6cb3888be2501c5f8217cd9b&pid=1-s2.0-S2095809924001590-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2095809924001590","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Information on the physicochemical properties of chemical species is an important prerequisite when performing tasks such as process design and product design. However, the lack of extensive data and high experimental costs hinder the development of prediction techniques for these properties. Moreover, accuracy and predictive capabilities still limit the scope and applicability of most property estimation methods. This paper proposes a new Gaussian process-based modeling framework that aims to manage a discrete and high-dimensional input space related to molecular structure representation with the group-contribution approach. A warping function is used to map discrete input into a continuous domain in order to adjust the correlation between different compounds. Prior selection techniques, including prior elicitation and prior predictive checking, are also applied during the building procedure to provide the model with more information from previous research findings. The framework is assessed using datasets of varying sizes for 20 pure component properties. For 18 out of the 20 pure component properties, the new models are found to give improved accuracy and predictive power in comparison with other published models, with and without machine learning.

用于纯组件特性估计的改进型机器学习模型
在进行工艺设计和产品设计等工作时,有关化学物质理化性质的信息是一个重要的先决条件。然而,大量数据的缺乏和高昂的实验成本阻碍了这些性质预测技术的发展。此外,准确性和预测能力仍然限制了大多数性质估计方法的范围和适用性。本文提出了一种新的基于高斯过程的建模框架,旨在利用组贡献方法管理与分子结构表征相关的离散高维输入空间。使用扭曲函数将离散输入映射到连续域,以调整不同化合物之间的相关性。在构建过程中,还应用了先验选择技术,包括先验激发和先验预测检查,以便从先前的研究成果中为模型提供更多信息。该框架使用不同规模的数据集对 20 种纯成分特性进行了评估。在 20 个纯组件属性中的 18 个属性中,与其他已发布的模型相比,无论是否使用机器学习,新模型的准确性和预测能力都有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Engineering
Engineering Environmental Science-Environmental Engineering
自引率
1.60%
发文量
335
审稿时长
35 days
期刊介绍: Engineering, an international open-access journal initiated by the Chinese Academy of Engineering (CAE) in 2015, serves as a distinguished platform for disseminating cutting-edge advancements in engineering R&D, sharing major research outputs, and highlighting key achievements worldwide. The journal's objectives encompass reporting progress in engineering science, fostering discussions on hot topics, addressing areas of interest, challenges, and prospects in engineering development, while considering human and environmental well-being and ethics in engineering. It aims to inspire breakthroughs and innovations with profound economic and social significance, propelling them to advanced international standards and transforming them into a new productive force. Ultimately, this endeavor seeks to bring about positive changes globally, benefit humanity, and shape a new future.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信