Xinchen Xiang , Zheng Cao , Yukun Qian , Dan Lu , Jiancong Lu , Jinyan Wang , Shiyu Zhou , Lijun Liang , Zhikan Yao , Lin Zhang
{"title":"Evaluating and advancing large language models for nanofiltration membrane knowledge tasks","authors":"Xinchen Xiang , Zheng Cao , Yukun Qian , Dan Lu , Jiancong Lu , Jinyan Wang , Shiyu Zhou , Lijun Liang , Zhikan Yao , Lin Zhang","doi":"10.1016/j.advmem.2025.100161","DOIUrl":null,"url":null,"abstract":"<div><div>Nanofiltration (NF) is a rapidly growing field, resulting in a surge of publications with diverse focuses. It's challenging for researchers to quickly find key information from the vast amount of publications. Large language models (LLMs) have shown promise in analyzing article and reasoning about knowledge in some scientific fields, but their effectiveness in membrane research is unclear. Here, we introduced the first benchmark specifically designed for membrane studies and used it to systematically evaluate six general-purpose LLMs (i.e., Claude-3.5, Deepseek-R1, Gemini-2.0, GPT-4o-mini, Llama-3.2, and Mistral-small-3.1). Our findings revealed that the complexity and depth of NF knowledge pose a significant challenge for these LLMs, leading to poor performance, particularly in tasks involving membrane mechanisms. To enhance LLMs' using in this field, we developed a specialized NF database and integrated it with the LLMs using Retrieval-Augmented Generation (RAG). RAG significantly improved performance across all models, with average gains of 18.5 % on Question type tasks and 10.8 % on Reasoning type tasks. Moreover, in areas such as membrane fabrication and characterization, several models with RAG demonstrated performance exceeding that of human experts. These results suggested that RAG is a promising strategy for leveraging LLMs in NF research. This study introduced a new path for applying LLMs to membrane research and proposes a professional benchmark to ensure the reliable and effective use of LLMs.</div></div>","PeriodicalId":100033,"journal":{"name":"Advanced Membranes","volume":"5 ","pages":"Article 100161"},"PeriodicalIF":9.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Membranes","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772823425000351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nanofiltration (NF) is a rapidly growing field, resulting in a surge of publications with diverse focuses. It's challenging for researchers to quickly find key information from the vast amount of publications. Large language models (LLMs) have shown promise in analyzing article and reasoning about knowledge in some scientific fields, but their effectiveness in membrane research is unclear. Here, we introduced the first benchmark specifically designed for membrane studies and used it to systematically evaluate six general-purpose LLMs (i.e., Claude-3.5, Deepseek-R1, Gemini-2.0, GPT-4o-mini, Llama-3.2, and Mistral-small-3.1). Our findings revealed that the complexity and depth of NF knowledge pose a significant challenge for these LLMs, leading to poor performance, particularly in tasks involving membrane mechanisms. To enhance LLMs' using in this field, we developed a specialized NF database and integrated it with the LLMs using Retrieval-Augmented Generation (RAG). RAG significantly improved performance across all models, with average gains of 18.5 % on Question type tasks and 10.8 % on Reasoning type tasks. Moreover, in areas such as membrane fabrication and characterization, several models with RAG demonstrated performance exceeding that of human experts. These results suggested that RAG is a promising strategy for leveraging LLMs in NF research. This study introduced a new path for applying LLMs to membrane research and proposes a professional benchmark to ensure the reliable and effective use of LLMs.