BatGPT-Chem: A Foundation Large Model for Chemical Engineering.

IF 10.7 1区 综合性期刊 Q1 Multidisciplinary
Research Pub Date : 2025-09-10 eCollection Date: 2025-01-01 DOI:10.34133/research.0827
Yifei Yang, Runhan Shi, Zuchao Li, Shu Jiang, Bao-Liang Lu, Qibin Zhao, Yang Yang, Hai Zhao
{"title":"BatGPT-Chem: A Foundation Large Model for Chemical Engineering.","authors":"Yifei Yang, Runhan Shi, Zuchao Li, Shu Jiang, Bao-Liang Lu, Qibin Zhao, Yang Yang, Hai Zhao","doi":"10.34133/research.0827","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs) have showcased remarkable capabilities in the realm of AI for Science, and chemistry has greatly benefited from the advancement of AI tools. With a strong capacity for learning sequential data like natural language, LLMs offer immense potential. Despite this promise, the application of LLMs in chemistry remains limited, with few models specifically designed for chemical data and tasks. Hence, we propose leveraging LLMs to comprehensively model both chemical sequences and natural language sequences, aiming to tackle diverse chemical tasks. We introduce BatGPT-Chem, a general foundation large-scale model with 15 billion parameters tailored for chemical engineering. Built on a corpus of over 100 million chemical instances, BatGPT-Chem specializes in 5 core tasks: retrosynthesis prediction, molecule design, molecule description, product inference, and yield prediction. BatGPT-Chem comprehensively models the information flow between chemical language and natural language, enabling full-spectrum prediction across chemical tasks. It is one of the largest bilingual chemistry-specific LLMs, supporting both English and Chinese for input and output. BatGPT-Chem is also the first automated retrosynthesis tool capable of explicitly predicting reaction conditions, a critical but often overlooked aspect in previous models. Through rigorous zero-shot evaluations, BatGPT-Chem demonstrates state-of-the-art performance, surpassing both existing chemical LLMs and general-purpose models in accuracy and validity across a diverse range of tasks. Notably, it demonstrates superior ability in predicting both reactants and reaction conditions, as well as strong generalization in low-data settings. These results suggest that BatGPT-Chem is among the most advanced and practical chemical LLMs, with strong potential to support real-world applications in synthesis planning, drug discovery, and materials design.</p>","PeriodicalId":21120,"journal":{"name":"Research","volume":"8 ","pages":"0827"},"PeriodicalIF":10.7000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12421729/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.34133/research.0827","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) have showcased remarkable capabilities in the realm of AI for Science, and chemistry has greatly benefited from the advancement of AI tools. With a strong capacity for learning sequential data like natural language, LLMs offer immense potential. Despite this promise, the application of LLMs in chemistry remains limited, with few models specifically designed for chemical data and tasks. Hence, we propose leveraging LLMs to comprehensively model both chemical sequences and natural language sequences, aiming to tackle diverse chemical tasks. We introduce BatGPT-Chem, a general foundation large-scale model with 15 billion parameters tailored for chemical engineering. Built on a corpus of over 100 million chemical instances, BatGPT-Chem specializes in 5 core tasks: retrosynthesis prediction, molecule design, molecule description, product inference, and yield prediction. BatGPT-Chem comprehensively models the information flow between chemical language and natural language, enabling full-spectrum prediction across chemical tasks. It is one of the largest bilingual chemistry-specific LLMs, supporting both English and Chinese for input and output. BatGPT-Chem is also the first automated retrosynthesis tool capable of explicitly predicting reaction conditions, a critical but often overlooked aspect in previous models. Through rigorous zero-shot evaluations, BatGPT-Chem demonstrates state-of-the-art performance, surpassing both existing chemical LLMs and general-purpose models in accuracy and validity across a diverse range of tasks. Notably, it demonstrates superior ability in predicting both reactants and reaction conditions, as well as strong generalization in low-data settings. These results suggest that BatGPT-Chem is among the most advanced and practical chemical LLMs, with strong potential to support real-world applications in synthesis planning, drug discovery, and materials design.

BatGPT-Chem:化学工程的基础大型模型。
大型语言模型(llm)在人工智能科学领域展示了非凡的能力,化学也从人工智能工具的进步中受益匪浅。法学硕士具有像自然语言一样学习顺序数据的强大能力,提供了巨大的潜力。尽管有这样的前景,llm在化学中的应用仍然有限,很少有专门为化学数据和任务设计的模型。因此,我们建议利用llm对化学序列和自然语言序列进行综合建模,旨在解决各种化学任务。我们介绍了BatGPT-Chem,这是一个为化工定制的具有150亿个参数的通用基础大尺度模型。BatGPT-Chem建立在超过1亿个化学实例的语料库上,专注于5个核心任务:反合成预测、分子设计、分子描述、产物推断和产率预测。BatGPT-Chem全面模拟化学语言和自然语言之间的信息流,实现跨化学任务的全谱预测。它是最大的双语化学专业法学硕士之一,支持中英文输入和输出。BatGPT-Chem也是第一个能够明确预测反应条件的自动反合成工具,这是以前模型中一个关键但经常被忽视的方面。通过严格的零射击评估,BatGPT-Chem展示了最先进的性能,在各种任务的准确性和有效性方面超过了现有的化学llm和通用模型。值得注意的是,它在预测反应物和反应条件方面表现出卓越的能力,并且在低数据设置中具有很强的泛化能力。这些结果表明,BatGPT-Chem是最先进和实用的化学法学硕士之一,具有强大的潜力,可以支持合成计划,药物发现和材料设计等实际应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Research
Research Multidisciplinary-Multidisciplinary
CiteScore
13.40
自引率
3.60%
发文量
0
审稿时长
14 weeks
期刊介绍: Research serves as a global platform for academic exchange, collaboration, and technological advancements. This journal welcomes high-quality research contributions from any domain, with open arms to authors from around the globe. Comprising fundamental research in the life and physical sciences, Research also highlights significant findings and issues in engineering and applied science. The journal proudly features original research articles, reviews, perspectives, and editorials, fostering a diverse and dynamic scholarly environment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信