Shi Xuan Leong, Sergio Pablo-García, Brandon Wong, Alán Aspuru-Guzik
{"title":"MERMaid: Universal multimodal mining of chemical reactions from PDFs using vision-language models","authors":"Shi Xuan Leong, Sergio Pablo-García, Brandon Wong, Alán Aspuru-Guzik","doi":"10.1016/j.matt.2025.102331","DOIUrl":null,"url":null,"abstract":"Data digitization of scientific literature is essential for creating machine-actionable knowledge bases to advance data-driven research and integrate with self-driving laboratories. It is especially critical to extract, interpret, and structure data from graphical elements, the primary medium for conveying complex scientific insights. However, this remains challenging due to the inherent lack of semantic structure in the prevalent PDF format, the complexity of visual content, and the need for multimodal integration. We present MERMaid (multimodal aid for reaction mining), an end-to-end pipeline that converts disparate visual data across PDFs into a coherent knowledge graph. Leveraging the emergent visual cognition and reasoning capabilities of vision-language models, MERMaid demonstrates chemical context awareness, self-directed context completion, and robust coreference resolution to achieve 87% end-to-end accuracy across three chemical domains. Its modular design facilitates future application to diverse data beyond reaction mining, promising to unlock the full potential of scientific literature for knowledge-intensive applications.","PeriodicalId":388,"journal":{"name":"Matter","volume":"117 1","pages":""},"PeriodicalIF":17.3000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Matter","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1016/j.matt.2025.102331","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Data digitization of scientific literature is essential for creating machine-actionable knowledge bases to advance data-driven research and integrate with self-driving laboratories. It is especially critical to extract, interpret, and structure data from graphical elements, the primary medium for conveying complex scientific insights. However, this remains challenging due to the inherent lack of semantic structure in the prevalent PDF format, the complexity of visual content, and the need for multimodal integration. We present MERMaid (multimodal aid for reaction mining), an end-to-end pipeline that converts disparate visual data across PDFs into a coherent knowledge graph. Leveraging the emergent visual cognition and reasoning capabilities of vision-language models, MERMaid demonstrates chemical context awareness, self-directed context completion, and robust coreference resolution to achieve 87% end-to-end accuracy across three chemical domains. Its modular design facilitates future application to diverse data beyond reaction mining, promising to unlock the full potential of scientific literature for knowledge-intensive applications.
期刊介绍:
Matter, a monthly journal affiliated with Cell, spans the broad field of materials science from nano to macro levels,covering fundamentals to applications. Embracing groundbreaking technologies,it includes full-length research articles,reviews, perspectives,previews, opinions, personnel stories, and general editorial content.
Matter aims to be the primary resource for researchers in academia and industry, inspiring the next generation of materials scientists.