{"title":"Developing deep learning-based large-scale organic reaction classification model via sigma-profiles","authors":"Wenlong Wang , Chenyang Xu , Jian Du , Lei Zhang","doi":"10.1016/j.gce.2024.06.003","DOIUrl":null,"url":null,"abstract":"<div><div>Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different clusters based on their specific characteristics, which makes target-guided navigation in the vast chemical space possible. Although previous attempts that apply deep learning to reaction classification tasks have made substantial progress, developing a model with good interpretability as well as high accuracy for large-scale reaction classification tasks remains an open question. In this work, a deep learning-based model for a large-scale reaction classification task is first constructed by utilizing pre-trained BERT and autoencoder. Then, the model is trained under the open-source dataset USPTO_TPL which contains recorded reactions of up to 1000 different types. The multi-classification accuracy of the model on the testing dataset is 99.382%, showing its great potential for practical use. Besides, a reaction similarity map is presented to correlate the reactions in the USPTO_TPL dataset based on their sigma-profile-based statistical features. Finally, representative reactions from the testing dataset are provided to illustrate the model's effectiveness on the reaction classification task.</div></div>","PeriodicalId":66474,"journal":{"name":"Green Chemical Engineering","volume":"6 2","pages":"Pages 181-192"},"PeriodicalIF":9.1000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Green Chemical Engineering","FirstCategoryId":"1089","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666952824000396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different clusters based on their specific characteristics, which makes target-guided navigation in the vast chemical space possible. Although previous attempts that apply deep learning to reaction classification tasks have made substantial progress, developing a model with good interpretability as well as high accuracy for large-scale reaction classification tasks remains an open question. In this work, a deep learning-based model for a large-scale reaction classification task is first constructed by utilizing pre-trained BERT and autoencoder. Then, the model is trained under the open-source dataset USPTO_TPL which contains recorded reactions of up to 1000 different types. The multi-classification accuracy of the model on the testing dataset is 99.382%, showing its great potential for practical use. Besides, a reaction similarity map is presented to correlate the reactions in the USPTO_TPL dataset based on their sigma-profile-based statistical features. Finally, representative reactions from the testing dataset are provided to illustrate the model's effectiveness on the reaction classification task.