基于多模态大语言模型的大规模化学反应图像解析

IF 7.4 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Yufan Chen, Ching Ting Leung, Jianwei Sun, Yong Huang, Linyan Li, Hao Chen, Hanyu Gao
{"title":"基于多模态大语言模型的大规模化学反应图像解析","authors":"Yufan Chen, Ching Ting Leung, Jianwei Sun, Yong Huang, Linyan Li, Hao Chen, Hanyu Gao","doi":"10.1039/d5sc04173b","DOIUrl":null,"url":null,"abstract":"Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however, its effectiveness depends on the availability of high-quality chemical reaction data. Currently, most published chemical reactions are not available in machine-readable form, limiting the broader application of AI in this field. The extraction of published chemical reactions into structured databases still relies heavily on manual curation, and robust automatic parsing of chemical reaction images into machine-readable data remains a significant challenge. To address this, we introduce the Reaction Image Multimodal large language model (RxnIM), the first multimodal large language model specifically designed to parse chemical reaction images into machine-readable reaction data. RxnIM not only extracts key chemical components from reaction images but also interprets the textual content that describes reaction conditions. Together with a specially designed large-scale dataset generation method to support model training, our approach achieves excellent performance, with an average F<small><sub>1</sub></small> score of 88% on various benchmarks, surpassing state-of-the-art methods by an average of 5%. This represents a crucial step toward the automatic construction of large databases of machine-readable reaction data parsed from images in the chemistry literature, providing essential data resources for AI research in chemistry. The source code, model checkpoints, and datasets developed in this work are released under permissive licenses.","PeriodicalId":9909,"journal":{"name":"Chemical Science","volume":"39 1","pages":""},"PeriodicalIF":7.4000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model\",\"authors\":\"Yufan Chen, Ching Ting Leung, Jianwei Sun, Yong Huang, Linyan Li, Hao Chen, Hanyu Gao\",\"doi\":\"10.1039/d5sc04173b\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however, its effectiveness depends on the availability of high-quality chemical reaction data. Currently, most published chemical reactions are not available in machine-readable form, limiting the broader application of AI in this field. The extraction of published chemical reactions into structured databases still relies heavily on manual curation, and robust automatic parsing of chemical reaction images into machine-readable data remains a significant challenge. To address this, we introduce the Reaction Image Multimodal large language model (RxnIM), the first multimodal large language model specifically designed to parse chemical reaction images into machine-readable reaction data. RxnIM not only extracts key chemical components from reaction images but also interprets the textual content that describes reaction conditions. Together with a specially designed large-scale dataset generation method to support model training, our approach achieves excellent performance, with an average F<small><sub>1</sub></small> score of 88% on various benchmarks, surpassing state-of-the-art methods by an average of 5%. This represents a crucial step toward the automatic construction of large databases of machine-readable reaction data parsed from images in the chemistry literature, providing essential data resources for AI research in chemistry. The source code, model checkpoints, and datasets developed in this work are released under permissive licenses.\",\"PeriodicalId\":9909,\"journal\":{\"name\":\"Chemical Science\",\"volume\":\"39 1\",\"pages\":\"\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2025-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemical Science\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1039/d5sc04173b\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Science","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d5sc04173b","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

人工智能(AI)在推进有机化学研究方面显示出了巨大的希望;然而,其有效性取决于高质量化学反应数据的可用性。目前,大多数已发表的化学反应都没有机器可读的形式,限制了人工智能在这一领域的更广泛应用。将已发表的化学反应提取到结构化数据库仍然严重依赖于人工管理,并且将化学反应图像自动解析为机器可读数据仍然是一个重大挑战。为了解决这个问题,我们引入了反应图像多模态大语言模型(RxnIM),这是第一个专门设计用于将化学反应图像解析为机器可读的反应数据的多模态大语言模型。RxnIM不仅可以从反应图像中提取关键化学成分,还可以解释描述反应条件的文本内容。结合专门设计的大规模数据集生成方法来支持模型训练,我们的方法取得了出色的性能,在各种基准测试中平均F1得分为88%,比最先进的方法平均高出5%。这代表了从化学文献图像中解析的机器可读反应数据的大型数据库自动构建的关键一步,为化学领域的人工智能研究提供了必要的数据资源。在本工作中开发的源代码、模型检查点和数据集是在宽松的许可下发布的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model
Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however, its effectiveness depends on the availability of high-quality chemical reaction data. Currently, most published chemical reactions are not available in machine-readable form, limiting the broader application of AI in this field. The extraction of published chemical reactions into structured databases still relies heavily on manual curation, and robust automatic parsing of chemical reaction images into machine-readable data remains a significant challenge. To address this, we introduce the Reaction Image Multimodal large language model (RxnIM), the first multimodal large language model specifically designed to parse chemical reaction images into machine-readable reaction data. RxnIM not only extracts key chemical components from reaction images but also interprets the textual content that describes reaction conditions. Together with a specially designed large-scale dataset generation method to support model training, our approach achieves excellent performance, with an average F1 score of 88% on various benchmarks, surpassing state-of-the-art methods by an average of 5%. This represents a crucial step toward the automatic construction of large databases of machine-readable reaction data parsed from images in the chemistry literature, providing essential data resources for AI research in chemistry. The source code, model checkpoints, and datasets developed in this work are released under permissive licenses.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Chemical Science
Chemical Science CHEMISTRY, MULTIDISCIPLINARY-
CiteScore
14.40
自引率
4.80%
发文量
1352
审稿时长
2.1 months
期刊介绍: Chemical Science is a journal that encompasses various disciplines within the chemical sciences. Its scope includes publishing ground-breaking research with significant implications for its respective field, as well as appealing to a wider audience in related areas. To be considered for publication, articles must showcase innovative and original advances in their field of study and be presented in a manner that is understandable to scientists from diverse backgrounds. However, the journal generally does not publish highly specialized research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信