Jibin Zhou , Feiyang Xu , Zhijun Chang , Duiping Liu , Lulu Li , Jian Cui , Yi Li , Xin Li , Li Qian , Zhixiong Zhang , Guoping Hu , Mao Ye , Zhongmin Liu
{"title":"从实验室到工厂:用于化学工程的大型语言模型","authors":"Jibin Zhou , Feiyang Xu , Zhijun Chang , Duiping Liu , Lulu Li , Jian Cui , Yi Li , Xin Li , Li Qian , Zhixiong Zhang , Guoping Hu , Mao Ye , Zhongmin Liu","doi":"10.1016/S1872-2067(25)64725-5","DOIUrl":null,"url":null,"abstract":"<div><div>The development of chemical technologies, which involves a multistage process covering laboratory research, scale-up to industrial deployment, and necessitates interdisciplinary collaboration, is often accompanied by substantial time and economic costs. To address these challenges, in this work, we report ChemELLM, a domain-specific large language model (LLM) with 70 billion parameters for chemical engineering. ChemELLM demonstrates state-of-the-art performance across critical tasks ranging from foundational understanding to professional problem-solving. It outperforms mainstream LLMs (e.g., O1-Preview, GPT-4o, and DeepSeek-R1) on ChemEBench, the first multidimensional benchmark for chemical engineering, which encompasses 15 dimensions across 101 distinct essential tasks. To support robust model development, we curated ChemEData, a purpose-built dataset containing 19 billion tokens for pre-training and 1 billion tokens for fine-tuning. This work establishes a new paradigm for artificial intelligence-driven innovation, bridging the gap between laboratory‐scale innovation and industrial‐scale implementation, thus accelerating technological advancement in chemical engineering. ChemELLM is publicly available at <span><span>https://chemindustry.iflytek.com/chat</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":9832,"journal":{"name":"Chinese Journal of Catalysis","volume":"73 ","pages":"Pages 159-173"},"PeriodicalIF":15.7000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From lab to fab: A large language model for chemical engineering\",\"authors\":\"Jibin Zhou , Feiyang Xu , Zhijun Chang , Duiping Liu , Lulu Li , Jian Cui , Yi Li , Xin Li , Li Qian , Zhixiong Zhang , Guoping Hu , Mao Ye , Zhongmin Liu\",\"doi\":\"10.1016/S1872-2067(25)64725-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The development of chemical technologies, which involves a multistage process covering laboratory research, scale-up to industrial deployment, and necessitates interdisciplinary collaboration, is often accompanied by substantial time and economic costs. To address these challenges, in this work, we report ChemELLM, a domain-specific large language model (LLM) with 70 billion parameters for chemical engineering. ChemELLM demonstrates state-of-the-art performance across critical tasks ranging from foundational understanding to professional problem-solving. It outperforms mainstream LLMs (e.g., O1-Preview, GPT-4o, and DeepSeek-R1) on ChemEBench, the first multidimensional benchmark for chemical engineering, which encompasses 15 dimensions across 101 distinct essential tasks. To support robust model development, we curated ChemEData, a purpose-built dataset containing 19 billion tokens for pre-training and 1 billion tokens for fine-tuning. This work establishes a new paradigm for artificial intelligence-driven innovation, bridging the gap between laboratory‐scale innovation and industrial‐scale implementation, thus accelerating technological advancement in chemical engineering. ChemELLM is publicly available at <span><span>https://chemindustry.iflytek.com/chat</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":9832,\"journal\":{\"name\":\"Chinese Journal of Catalysis\",\"volume\":\"73 \",\"pages\":\"Pages 159-173\"},\"PeriodicalIF\":15.7000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Journal of Catalysis\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1872206725647255\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Catalysis","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872206725647255","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
From lab to fab: A large language model for chemical engineering
The development of chemical technologies, which involves a multistage process covering laboratory research, scale-up to industrial deployment, and necessitates interdisciplinary collaboration, is often accompanied by substantial time and economic costs. To address these challenges, in this work, we report ChemELLM, a domain-specific large language model (LLM) with 70 billion parameters for chemical engineering. ChemELLM demonstrates state-of-the-art performance across critical tasks ranging from foundational understanding to professional problem-solving. It outperforms mainstream LLMs (e.g., O1-Preview, GPT-4o, and DeepSeek-R1) on ChemEBench, the first multidimensional benchmark for chemical engineering, which encompasses 15 dimensions across 101 distinct essential tasks. To support robust model development, we curated ChemEData, a purpose-built dataset containing 19 billion tokens for pre-training and 1 billion tokens for fine-tuning. This work establishes a new paradigm for artificial intelligence-driven innovation, bridging the gap between laboratory‐scale innovation and industrial‐scale implementation, thus accelerating technological advancement in chemical engineering. ChemELLM is publicly available at https://chemindustry.iflytek.com/chat.
期刊介绍:
The journal covers a broad scope, encompassing new trends in catalysis for applications in energy production, environmental protection, and the preparation of materials, petroleum chemicals, and fine chemicals. It explores the scientific foundation for preparing and activating catalysts of commercial interest, emphasizing representative models.The focus includes spectroscopic methods for structural characterization, especially in situ techniques, as well as new theoretical methods with practical impact in catalysis and catalytic reactions.The journal delves into the relationship between homogeneous and heterogeneous catalysis and includes theoretical studies on the structure and reactivity of catalysts.Additionally, contributions on photocatalysis, biocatalysis, surface science, and catalysis-related chemical kinetics are welcomed.