MERMaid: Universal multimodal mining of chemical reactions from PDFs using vision-language models

IF 17.3 1区 材料科学 Q1 MATERIALS SCIENCE, MULTIDISCIPLINARY
Matter Pub Date : 2025-07-25 DOI:10.1016/j.matt.2025.102331
Shi Xuan Leong, Sergio Pablo-García, Brandon Wong, Alán Aspuru-Guzik
{"title":"MERMaid: Universal multimodal mining of chemical reactions from PDFs using vision-language models","authors":"Shi Xuan Leong, Sergio Pablo-García, Brandon Wong, Alán Aspuru-Guzik","doi":"10.1016/j.matt.2025.102331","DOIUrl":null,"url":null,"abstract":"Data digitization of scientific literature is essential for creating machine-actionable knowledge bases to advance data-driven research and integrate with self-driving laboratories. It is especially critical to extract, interpret, and structure data from graphical elements, the primary medium for conveying complex scientific insights. However, this remains challenging due to the inherent lack of semantic structure in the prevalent PDF format, the complexity of visual content, and the need for multimodal integration. We present MERMaid (multimodal aid for reaction mining), an end-to-end pipeline that converts disparate visual data across PDFs into a coherent knowledge graph. Leveraging the emergent visual cognition and reasoning capabilities of vision-language models, MERMaid demonstrates chemical context awareness, self-directed context completion, and robust coreference resolution to achieve 87% end-to-end accuracy across three chemical domains. Its modular design facilitates future application to diverse data beyond reaction mining, promising to unlock the full potential of scientific literature for knowledge-intensive applications.","PeriodicalId":388,"journal":{"name":"Matter","volume":"117 1","pages":""},"PeriodicalIF":17.3000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Matter","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1016/j.matt.2025.102331","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Data digitization of scientific literature is essential for creating machine-actionable knowledge bases to advance data-driven research and integrate with self-driving laboratories. It is especially critical to extract, interpret, and structure data from graphical elements, the primary medium for conveying complex scientific insights. However, this remains challenging due to the inherent lack of semantic structure in the prevalent PDF format, the complexity of visual content, and the need for multimodal integration. We present MERMaid (multimodal aid for reaction mining), an end-to-end pipeline that converts disparate visual data across PDFs into a coherent knowledge graph. Leveraging the emergent visual cognition and reasoning capabilities of vision-language models, MERMaid demonstrates chemical context awareness, self-directed context completion, and robust coreference resolution to achieve 87% end-to-end accuracy across three chemical domains. Its modular design facilitates future application to diverse data beyond reaction mining, promising to unlock the full potential of scientific literature for knowledge-intensive applications.

Abstract Image

美人鱼:使用视觉语言模型从pdf中挖掘化学反应的通用多模态
科学文献的数据数字化对于创建机器可操作的知识库、推进数据驱动研究和与自动驾驶实验室集成至关重要。从图形元素中提取、解释和构造数据尤其重要,图形元素是传达复杂科学见解的主要媒介。然而,由于普遍的PDF格式缺乏语义结构、视觉内容的复杂性以及对多模态集成的需求,这仍然具有挑战性。我们提出了MERMaid(用于反应挖掘的多模式辅助工具),这是一个端到端的管道,可以将pdf中的不同视觉数据转换为连贯的知识图。利用视觉语言模型的紧急视觉认知和推理能力,MERMaid展示了化学上下文感知、自我导向上下文完成和鲁棒的共同参考分辨率,在三个化学领域实现了87%的端到端准确性。它的模块化设计有助于在反应挖掘之外的各种数据的未来应用,有望为知识密集型应用解锁科学文献的全部潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Matter
Matter MATERIALS SCIENCE, MULTIDISCIPLINARY-
CiteScore
26.30
自引率
2.60%
发文量
367
期刊介绍: Matter, a monthly journal affiliated with Cell, spans the broad field of materials science from nano to macro levels,covering fundamentals to applications. Embracing groundbreaking technologies,it includes full-length research articles,reviews, perspectives,previews, opinions, personnel stories, and general editorial content. Matter aims to be the primary resource for researchers in academia and industry, inspiring the next generation of materials scientists.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信