Developing deep learning-based large-scale organic reaction classification model via sigma-profiles

IF 9.1 Q1 ENGINEERING, CHEMICAL
Wenlong Wang , Chenyang Xu , Jian Du , Lei Zhang
{"title":"Developing deep learning-based large-scale organic reaction classification model via sigma-profiles","authors":"Wenlong Wang ,&nbsp;Chenyang Xu ,&nbsp;Jian Du ,&nbsp;Lei Zhang","doi":"10.1016/j.gce.2024.06.003","DOIUrl":null,"url":null,"abstract":"<div><div>Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different clusters based on their specific characteristics, which makes target-guided navigation in the vast chemical space possible. Although previous attempts that apply deep learning to reaction classification tasks have made substantial progress, developing a model with good interpretability as well as high accuracy for large-scale reaction classification tasks remains an open question. In this work, a deep learning-based model for a large-scale reaction classification task is first constructed by utilizing pre-trained BERT and autoencoder. Then, the model is trained under the open-source dataset USPTO_TPL which contains recorded reactions of up to 1000 different types. The multi-classification accuracy of the model on the testing dataset is 99.382%, showing its great potential for practical use. Besides, a reaction similarity map is presented to correlate the reactions in the USPTO_TPL dataset based on their sigma-profile-based statistical features. Finally, representative reactions from the testing dataset are provided to illustrate the model's effectiveness on the reaction classification task.</div></div>","PeriodicalId":66474,"journal":{"name":"Green Chemical Engineering","volume":"6 2","pages":"Pages 181-192"},"PeriodicalIF":9.1000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Green Chemical Engineering","FirstCategoryId":"1089","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666952824000396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different clusters based on their specific characteristics, which makes target-guided navigation in the vast chemical space possible. Although previous attempts that apply deep learning to reaction classification tasks have made substantial progress, developing a model with good interpretability as well as high accuracy for large-scale reaction classification tasks remains an open question. In this work, a deep learning-based model for a large-scale reaction classification task is first constructed by utilizing pre-trained BERT and autoencoder. Then, the model is trained under the open-source dataset USPTO_TPL which contains recorded reactions of up to 1000 different types. The multi-classification accuracy of the model on the testing dataset is 99.382%, showing its great potential for practical use. Besides, a reaction similarity map is presented to correlate the reactions in the USPTO_TPL dataset based on their sigma-profile-based statistical features. Finally, representative reactions from the testing dataset are provided to illustrate the model's effectiveness on the reaction classification task.

Abstract Image

通过西格玛档案开发基于深度学习的大规模有机反应分类模型
像深度学习这样的先进技术加速了新的化学反应的发现,特别是在有机合成领域。有成千上万的反应可供参考,有效利用它们的一种方法是根据化学反应的特定特征将化学反应分类成不同的簇,这使得在广阔的化学空间中进行目标制导导航成为可能。尽管之前将深度学习应用于反应分类任务的尝试已经取得了实质性进展,但开发一个具有良好可解释性和高准确性的大规模反应分类任务模型仍然是一个悬而未决的问题。在这项工作中,首先利用预训练的BERT和自编码器构建了一个基于深度学习的大规模反应分类任务模型。然后,在开源数据集USPTO_TPL下训练模型,该数据集包含多达1000种不同类型的记录反应。该模型在测试数据集上的多分类准确率达到99.382%,显示出极大的实际应用潜力。此外,基于反应的sigma-profile统计特征,给出了USPTO_TPL数据集中反应的相似度图。最后,给出了测试数据集中具有代表性的反应,以说明该模型在反应分类任务上的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Green Chemical Engineering
Green Chemical Engineering Process Chemistry and Technology, Catalysis, Filtration and Separation
CiteScore
11.60
自引率
0.00%
发文量
58
审稿时长
51 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信