Machine learning and deep learning enabled fuel sooting tendency prediction from molecular structure

IF 2.7 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Runzhao Li , Jose Martin Herreros , Athanasios Tsolakis , Wenzhao Yang
{"title":"Machine learning and deep learning enabled fuel sooting tendency prediction from molecular structure","authors":"Runzhao Li ,&nbsp;Jose Martin Herreros ,&nbsp;Athanasios Tsolakis ,&nbsp;Wenzhao Yang","doi":"10.1016/j.jmgm.2021.108083","DOIUrl":null,"url":null,"abstract":"<div><p><span>Soot formation models become increasingly important in advanced renewable fuels formulation for soot reduction benefit. This work evaluates performance of machine learning<span> (ML) and deep learning<span> (DL) to predict yield sooting index (YSI) from chemical structure and proposes a tailor-made convolution neural network (CNN)-SDSeries38 for </span></span></span>regression problem<span><span><span>. In ML, a novel quantitative structure-property relationship (QSPR) is developed for feature extraction and the relationship between molecular structure and YSI is built by ML algorithm. In DL, SDSeries38 contains 9 </span>feature learning modules, 1 regression module for automated feature learning and regression. It adopts standard series </span>network architecture<span><span> and modular structure, each feature learning module is a stack of convolution, batch normalization<span><span>, activation, pooling layers. ML-QSPR model outperforms SDSeries38 in accuracy (RMSE = 7.563 vs 19.58), computational speed and the former applies to fuel mixtures. In DL, SDSeries38 network exceeds 10 classical CNN and provides a generic architecture<span> enabling transfer application to other regression problem. DL application to regression is still in its infancy and there is no complete guide on how to develop specific CNN architectures for regression. Some gaps need to be filled: (1) Specially developed CNN architectures for regression are required; (2) The performances of direct transfer learning the classical CNN architectures from classification to regression are modest. A modular structure with typical function modules may provide an ideal solution; (3) Going deeper into the sequence of </span></span>convolution layers improves </span></span>predictive accuracy, but bears in mind to keep the number of layers below the threshold to avoid vanishing gradient.</span></span></p></div>","PeriodicalId":16361,"journal":{"name":"Journal of molecular graphics & modelling","volume":"111 ","pages":"Article 108083"},"PeriodicalIF":2.7000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of molecular graphics & modelling","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1093326321002540","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 3

Abstract

Soot formation models become increasingly important in advanced renewable fuels formulation for soot reduction benefit. This work evaluates performance of machine learning (ML) and deep learning (DL) to predict yield sooting index (YSI) from chemical structure and proposes a tailor-made convolution neural network (CNN)-SDSeries38 for regression problem. In ML, a novel quantitative structure-property relationship (QSPR) is developed for feature extraction and the relationship between molecular structure and YSI is built by ML algorithm. In DL, SDSeries38 contains 9 feature learning modules, 1 regression module for automated feature learning and regression. It adopts standard series network architecture and modular structure, each feature learning module is a stack of convolution, batch normalization, activation, pooling layers. ML-QSPR model outperforms SDSeries38 in accuracy (RMSE = 7.563 vs 19.58), computational speed and the former applies to fuel mixtures. In DL, SDSeries38 network exceeds 10 classical CNN and provides a generic architecture enabling transfer application to other regression problem. DL application to regression is still in its infancy and there is no complete guide on how to develop specific CNN architectures for regression. Some gaps need to be filled: (1) Specially developed CNN architectures for regression are required; (2) The performances of direct transfer learning the classical CNN architectures from classification to regression are modest. A modular structure with typical function modules may provide an ideal solution; (3) Going deeper into the sequence of convolution layers improves predictive accuracy, but bears in mind to keep the number of layers below the threshold to avoid vanishing gradient.

Abstract Image

机器学习和深度学习实现了分子结构对燃油燃灰趋势的预测
在先进的可再生燃料配方中,烟尘形成模型对减少烟尘的效益越来越重要。本文评估了机器学习(ML)和深度学习(DL)从化学结构预测产率指数(YSI)的性能,并提出了一个定制的卷积神经网络(CNN)-SDSeries38来解决回归问题。在机器学习中,提出了一种新的定量结构-性质关系(QSPR)用于特征提取,并通过机器学习算法建立了分子结构与YSI之间的关系。在深度学习中,SDSeries38包含9个特征学习模块,1个回归模块用于自动特征学习和回归。它采用标准的串联网络体系结构和模块化结构,每个特征学习模块是由卷积、批归一化、激活、池化层叠加而成。ML-QSPR模型在精度(RMSE = 7.563 vs 19.58)、计算速度方面优于SDSeries38,前者适用于燃料混合物。在深度学习中,SDSeries38网络超过了10个经典CNN,提供了一个通用的架构,可以将应用转移到其他回归问题。深度学习在回归中的应用仍处于起步阶段,并且没有关于如何为回归开发特定CNN架构的完整指南。需要填补一些空白:(1)需要专门开发用于回归的CNN架构;(2)经典CNN体系结构从分类到回归的直接迁移学习性能一般。具有典型功能模块的模块化结构可以提供理想的解决方案;(3)深入卷积层序列可以提高预测精度,但要注意层数要低于阈值,避免梯度消失。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of molecular graphics & modelling
Journal of molecular graphics & modelling 生物-计算机:跨学科应用
CiteScore
5.50
自引率
6.90%
发文量
216
审稿时长
35 days
期刊介绍: The Journal of Molecular Graphics and Modelling is devoted to the publication of papers on the uses of computers in theoretical investigations of molecular structure, function, interaction, and design. The scope of the journal includes all aspects of molecular modeling and computational chemistry, including, for instance, the study of molecular shape and properties, molecular simulations, protein and polymer engineering, drug design, materials design, structure-activity and structure-property relationships, database mining, and compound library design. As a primary research journal, JMGM seeks to bring new knowledge to the attention of our readers. As such, submissions to the journal need to not only report results, but must draw conclusions and explore implications of the work presented. Authors are strongly encouraged to bear this in mind when preparing manuscripts. Routine applications of standard modelling approaches, providing only very limited new scientific insight, will not meet our criteria for publication. Reproducibility of reported calculations is an important issue. Wherever possible, we urge authors to enhance their papers with Supplementary Data, for example, in QSAR studies machine-readable versions of molecular datasets or in the development of new force-field parameters versions of the topology and force field parameter files. Routine applications of existing methods that do not lead to genuinely new insight will not be considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信