ZeoReader: Automated extraction of synthesis steps from zeolite synthesis literature for autonomous experiments

IF 4.1 2区 工程技术 Q2 ENGINEERING, CHEMICAL
Song He , Wenli Du , Xin Peng , Xin Li
{"title":"ZeoReader: Automated extraction of synthesis steps from zeolite synthesis literature for autonomous experiments","authors":"Song He ,&nbsp;Wenli Du ,&nbsp;Xin Peng ,&nbsp;Xin Li","doi":"10.1016/j.ces.2024.120916","DOIUrl":null,"url":null,"abstract":"<div><div>Material synthesis literature documents detailed synthesis procedures, which provide valuable insight and guidance for designing practical synthesis routes. Information extraction (IE) techniques have emerged as powerful tools to obtain structured synthesis-related data. However, current IE methods struggle to differentiate semantically similar experimental records and extract dense experimental properties with abstract expressions, limiting their effectiveness in the zeolite synthesis domain. To this end, we propose ZeoReader, an end-to-end IE framework designed to extract synthesis steps from zeolite synthesis literature. Specifically, to effectively distinguish between semantically similar descriptions of synthesis and characterization experiments, ZeoReader constructs a MatSciBERT-based paragraph classifier that offers rich prior synthesis knowledge. For improving the extraction of complete synthesis steps in complex sentences, ZeoReader develops a two-stage synthesis step extraction model, which introduces customized contrastive learning to model the distributions of dense properties and capture features of abstract expressions. Furthermore, domain-specific parsing strategies are proposed to enable ZeoReader to automatically parse PDF documents, identify synthesis experimental passages, and extract structured zeolite synthesis steps containing actions and corresponding experimental properties. Extensive experiments demonstrate that ZeoReader detects synthesis passages with an accuracy of 94.06% on out-of-sample documents and extracts experimental actions and properties with an F1 score of 93.05% and 74.99%, respectively. Our proposed IE framework can be embedded in autonomous unmanned zeolite synthesis experiments to rapidly understand, reproduce and validate existing experimental routes, thus facilitating new zeolite exploration.</div></div>","PeriodicalId":271,"journal":{"name":"Chemical Engineering Science","volume":"302 ","pages":"Article 120916"},"PeriodicalIF":4.1000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Engineering Science","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0009250924012168","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Material synthesis literature documents detailed synthesis procedures, which provide valuable insight and guidance for designing practical synthesis routes. Information extraction (IE) techniques have emerged as powerful tools to obtain structured synthesis-related data. However, current IE methods struggle to differentiate semantically similar experimental records and extract dense experimental properties with abstract expressions, limiting their effectiveness in the zeolite synthesis domain. To this end, we propose ZeoReader, an end-to-end IE framework designed to extract synthesis steps from zeolite synthesis literature. Specifically, to effectively distinguish between semantically similar descriptions of synthesis and characterization experiments, ZeoReader constructs a MatSciBERT-based paragraph classifier that offers rich prior synthesis knowledge. For improving the extraction of complete synthesis steps in complex sentences, ZeoReader develops a two-stage synthesis step extraction model, which introduces customized contrastive learning to model the distributions of dense properties and capture features of abstract expressions. Furthermore, domain-specific parsing strategies are proposed to enable ZeoReader to automatically parse PDF documents, identify synthesis experimental passages, and extract structured zeolite synthesis steps containing actions and corresponding experimental properties. Extensive experiments demonstrate that ZeoReader detects synthesis passages with an accuracy of 94.06% on out-of-sample documents and extracts experimental actions and properties with an F1 score of 93.05% and 74.99%, respectively. Our proposed IE framework can be embedded in autonomous unmanned zeolite synthesis experiments to rapidly understand, reproduce and validate existing experimental routes, thus facilitating new zeolite exploration.
ZeoReader:从沸石合成文献中自动提取合成步骤,用于自主实验
材料合成文献记录了详细的合成过程,为设计实用的合成路线提供了宝贵的见解和指导。信息提取(IE)技术已成为获取结构化合成相关数据的有力工具。然而,目前的信息提取方法很难区分语义相似的实验记录,也很难用抽象的表达方式提取密集的实验属性,这限制了它们在沸石合成领域的有效性。为此,我们提出了 ZeoReader,这是一个端到端的 IE 框架,旨在从沸石合成文献中提取合成步骤。具体来说,为了有效区分语义相似的合成和表征实验描述,ZeoReader 构建了一个基于 MatSciBERT 的段落分类器,提供丰富的先验合成知识。为了改进复杂句子中完整合成步骤的提取,ZeoReader 开发了一个两阶段合成步骤提取模型,该模型引入了定制的对比学习,以模拟密集属性的分布并捕捉抽象表达的特征。此外,还提出了针对特定领域的解析策略,使 ZeoReader 能够自动解析 PDF 文档、识别合成实验段落并提取包含动作和相应实验属性的结构化沸石合成步骤。大量实验证明,ZeoReader 在样本外文档中检测合成段落的准确率高达 94.06%,提取实验操作和属性的 F1 分数分别为 93.05% 和 74.99%。我们提出的 IE 框架可以嵌入到自主无人沸石合成实验中,快速理解、重现和验证现有的实验路线,从而促进新的沸石探索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Chemical Engineering Science
Chemical Engineering Science 工程技术-工程:化工
CiteScore
7.50
自引率
8.50%
发文量
1025
审稿时长
50 days
期刊介绍: Chemical engineering enables the transformation of natural resources and energy into useful products for society. It draws on and applies natural sciences, mathematics and economics, and has developed fundamental engineering science that underpins the discipline. Chemical Engineering Science (CES) has been publishing papers on the fundamentals of chemical engineering since 1951. CES is the platform where the most significant advances in the discipline have ever since been published. Chemical Engineering Science has accompanied and sustained chemical engineering through its development into the vibrant and broad scientific discipline it is today.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信