{"title":"ZeoReader: Automated extraction of synthesis steps from zeolite synthesis literature for autonomous experiments","authors":"Song He , Wenli Du , Xin Peng , Xin Li","doi":"10.1016/j.ces.2024.120916","DOIUrl":null,"url":null,"abstract":"<div><div>Material synthesis literature documents detailed synthesis procedures, which provide valuable insight and guidance for designing practical synthesis routes. Information extraction (IE) techniques have emerged as powerful tools to obtain structured synthesis-related data. However, current IE methods struggle to differentiate semantically similar experimental records and extract dense experimental properties with abstract expressions, limiting their effectiveness in the zeolite synthesis domain. To this end, we propose ZeoReader, an end-to-end IE framework designed to extract synthesis steps from zeolite synthesis literature. Specifically, to effectively distinguish between semantically similar descriptions of synthesis and characterization experiments, ZeoReader constructs a MatSciBERT-based paragraph classifier that offers rich prior synthesis knowledge. For improving the extraction of complete synthesis steps in complex sentences, ZeoReader develops a two-stage synthesis step extraction model, which introduces customized contrastive learning to model the distributions of dense properties and capture features of abstract expressions. Furthermore, domain-specific parsing strategies are proposed to enable ZeoReader to automatically parse PDF documents, identify synthesis experimental passages, and extract structured zeolite synthesis steps containing actions and corresponding experimental properties. Extensive experiments demonstrate that ZeoReader detects synthesis passages with an accuracy of 94.06% on out-of-sample documents and extracts experimental actions and properties with an F1 score of 93.05% and 74.99%, respectively. Our proposed IE framework can be embedded in autonomous unmanned zeolite synthesis experiments to rapidly understand, reproduce and validate existing experimental routes, thus facilitating new zeolite exploration.</div></div>","PeriodicalId":271,"journal":{"name":"Chemical Engineering Science","volume":"302 ","pages":"Article 120916"},"PeriodicalIF":4.1000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Engineering Science","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0009250924012168","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Material synthesis literature documents detailed synthesis procedures, which provide valuable insight and guidance for designing practical synthesis routes. Information extraction (IE) techniques have emerged as powerful tools to obtain structured synthesis-related data. However, current IE methods struggle to differentiate semantically similar experimental records and extract dense experimental properties with abstract expressions, limiting their effectiveness in the zeolite synthesis domain. To this end, we propose ZeoReader, an end-to-end IE framework designed to extract synthesis steps from zeolite synthesis literature. Specifically, to effectively distinguish between semantically similar descriptions of synthesis and characterization experiments, ZeoReader constructs a MatSciBERT-based paragraph classifier that offers rich prior synthesis knowledge. For improving the extraction of complete synthesis steps in complex sentences, ZeoReader develops a two-stage synthesis step extraction model, which introduces customized contrastive learning to model the distributions of dense properties and capture features of abstract expressions. Furthermore, domain-specific parsing strategies are proposed to enable ZeoReader to automatically parse PDF documents, identify synthesis experimental passages, and extract structured zeolite synthesis steps containing actions and corresponding experimental properties. Extensive experiments demonstrate that ZeoReader detects synthesis passages with an accuracy of 94.06% on out-of-sample documents and extracts experimental actions and properties with an F1 score of 93.05% and 74.99%, respectively. Our proposed IE framework can be embedded in autonomous unmanned zeolite synthesis experiments to rapidly understand, reproduce and validate existing experimental routes, thus facilitating new zeolite exploration.
期刊介绍:
Chemical engineering enables the transformation of natural resources and energy into useful products for society. It draws on and applies natural sciences, mathematics and economics, and has developed fundamental engineering science that underpins the discipline.
Chemical Engineering Science (CES) has been publishing papers on the fundamentals of chemical engineering since 1951. CES is the platform where the most significant advances in the discipline have ever since been published. Chemical Engineering Science has accompanied and sustained chemical engineering through its development into the vibrant and broad scientific discipline it is today.