R. Odobesku, K. Romanova, S. Mirzaeva, O. Zagorulko, R. Sim, R. Khakimullin, J. Razlivina, A. Dmitrenko, V. Vinogradov
{"title":"Agent-based multimodal information extraction for nanomaterials","authors":"R. Odobesku, K. Romanova, S. Mirzaeva, O. Zagorulko, R. Sim, R. Khakimullin, J. Razlivina, A. Dmitrenko, V. Vinogradov","doi":"10.1038/s41524-025-01674-7","DOIUrl":null,"url":null,"abstract":"<p>Automating structured data extraction from scientific literature is a critical challenge with broad implications across domains. We introduce nanoMINER, a multi-agent system combining large language models and multimodal analysis to extract essential information from scientific research articles on nanomaterials. This system processes documents end-to-end, utilizing tools such as YOLO for visual data extraction and GPT-4o for linking textual and visual information. At its core, the ReAct agent orchestrates specialized agents to ensure comprehensive data extraction. We demonstrate the efficacy of the system by automating the assembly of nanomaterial and nanozyme datasets previously manually curated by domain experts. NanoMINER achieves high precision in extracting nanomaterial properties like chemical formulas, crystal systems, and surface characteristics. For nanozymes, we obtain near-perfect precision (0.98) for kinetic parameters and essential features such as Cmin and Cmax. To benchmark the system performance, we also compare nanoMINER to several baseline LLMs, including the most recent multimodal GPT-4.1, and show consistently higher extraction precision and recall. Our approach is extensible to other domains of materials science and fields like biomedicine, advancing data-driven research methodologies and automated knowledge extraction.</p>","PeriodicalId":19342,"journal":{"name":"npj Computational Materials","volume":"18 1","pages":""},"PeriodicalIF":9.4000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Computational Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1038/s41524-025-01674-7","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Automating structured data extraction from scientific literature is a critical challenge with broad implications across domains. We introduce nanoMINER, a multi-agent system combining large language models and multimodal analysis to extract essential information from scientific research articles on nanomaterials. This system processes documents end-to-end, utilizing tools such as YOLO for visual data extraction and GPT-4o for linking textual and visual information. At its core, the ReAct agent orchestrates specialized agents to ensure comprehensive data extraction. We demonstrate the efficacy of the system by automating the assembly of nanomaterial and nanozyme datasets previously manually curated by domain experts. NanoMINER achieves high precision in extracting nanomaterial properties like chemical formulas, crystal systems, and surface characteristics. For nanozymes, we obtain near-perfect precision (0.98) for kinetic parameters and essential features such as Cmin and Cmax. To benchmark the system performance, we also compare nanoMINER to several baseline LLMs, including the most recent multimodal GPT-4.1, and show consistently higher extraction precision and recall. Our approach is extensible to other domains of materials science and fields like biomedicine, advancing data-driven research methodologies and automated knowledge extraction.
期刊介绍:
npj Computational Materials is a high-quality open access journal from Nature Research that publishes research papers applying computational approaches for the design of new materials and enhancing our understanding of existing ones. The journal also welcomes papers on new computational techniques and the refinement of current approaches that support these aims, as well as experimental papers that complement computational findings.
Some key features of npj Computational Materials include a 2-year impact factor of 12.241 (2021), article downloads of 1,138,590 (2021), and a fast turnaround time of 11 days from submission to the first editorial decision. The journal is indexed in various databases and services, including Chemical Abstracts Service (ACS), Astrophysics Data System (ADS), Current Contents/Physical, Chemical and Earth Sciences, Journal Citation Reports/Science Edition, SCOPUS, EI Compendex, INSPEC, Google Scholar, SCImago, DOAJ, CNKI, and Science Citation Index Expanded (SCIE), among others.