Evaluating and Advancing Large Language Models for Water Knowledge Tasks in Engineering and Research

IF 8.9 2区 环境科学与生态学 Q1 ENGINEERING, ENVIRONMENTAL
Boyan Xu, Zihao Li, Yuxin Yang, Guanlan Wu, Chengzhi Wang, Xiongpeng Tang, Yu Li, Zihao Wu, Qingxian Su, Xueqing Shi, Yue Yang, Rui Tong, Liang Wen* and How Yong Ng*, 
{"title":"Evaluating and Advancing Large Language Models for Water Knowledge Tasks in Engineering and Research","authors":"Boyan Xu,&nbsp;Zihao Li,&nbsp;Yuxin Yang,&nbsp;Guanlan Wu,&nbsp;Chengzhi Wang,&nbsp;Xiongpeng Tang,&nbsp;Yu Li,&nbsp;Zihao Wu,&nbsp;Qingxian Su,&nbsp;Xueqing Shi,&nbsp;Yue Yang,&nbsp;Rui Tong,&nbsp;Liang Wen* and How Yong Ng*,&nbsp;","doi":"10.1021/acs.estlett.5c0003810.1021/acs.estlett.5c00038","DOIUrl":null,"url":null,"abstract":"<p >Although large language models (LLMs) have demonstrated significant value in numerous fields, there remains limited research on evaluating their performance or enhancing their capabilities within water science and technology. This study initially assessed the performance of eight foundational models (i.e., GPT-4, GPT-3.5, Gemini, GLM-4, ERNIE, QWEN, Llama3-8B, and Llama3-70B) on a wide range of water knowledge tasks in engineering and research by developing an evaluation suite called WaterER (i.e., 1043 tasks). GPT-4 was demonstrated to excel in diverse water knowledge tasks in engineering and research. Llama3-70B was best for Chinese engineering queries, while Chinese-oriented models outperformed GPT-3.5 in English engineering tasks. Gemini demonstrated specialized academic capabilities in wastewater treatment, environmental restoration, drinking water treatment, sanitation, anaerobic digestion, and contaminants. To further advance LLMs, we employed prompt engineering (i.e., five-shot learning) and fine-tuned open-sourced Llama3-8B into a specialized model, namely, WaterGPT. WaterGPT exhibited enhanced reasoning capabilities, outperforming Llama3-8B by over 135.4% on English engineering tasks and 18.8% on research tasks. Additionally, fine-tuning proved to be more reliable and effective than prompt engineering. Collectively, this study established various LLMs’ baseline performance in water sectors while highlighting the robust evaluation frameworks and augmentation techniques to ensure the effective and reliable use of LLMs.</p>","PeriodicalId":37,"journal":{"name":"Environmental Science & Technology Letters Environ.","volume":"12 3","pages":"289–296 289–296"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science & Technology Letters Environ.","FirstCategoryId":"1","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.estlett.5c00038","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Although large language models (LLMs) have demonstrated significant value in numerous fields, there remains limited research on evaluating their performance or enhancing their capabilities within water science and technology. This study initially assessed the performance of eight foundational models (i.e., GPT-4, GPT-3.5, Gemini, GLM-4, ERNIE, QWEN, Llama3-8B, and Llama3-70B) on a wide range of water knowledge tasks in engineering and research by developing an evaluation suite called WaterER (i.e., 1043 tasks). GPT-4 was demonstrated to excel in diverse water knowledge tasks in engineering and research. Llama3-70B was best for Chinese engineering queries, while Chinese-oriented models outperformed GPT-3.5 in English engineering tasks. Gemini demonstrated specialized academic capabilities in wastewater treatment, environmental restoration, drinking water treatment, sanitation, anaerobic digestion, and contaminants. To further advance LLMs, we employed prompt engineering (i.e., five-shot learning) and fine-tuned open-sourced Llama3-8B into a specialized model, namely, WaterGPT. WaterGPT exhibited enhanced reasoning capabilities, outperforming Llama3-8B by over 135.4% on English engineering tasks and 18.8% on research tasks. Additionally, fine-tuning proved to be more reliable and effective than prompt engineering. Collectively, this study established various LLMs’ baseline performance in water sectors while highlighting the robust evaluation frameworks and augmentation techniques to ensure the effective and reliable use of LLMs.

Abstract Image

评估和推进工程和研究中水知识任务的大型语言模型
尽管大型语言模型(llm)已经在许多领域展示了重要的价值,但在水科学和技术中评估其性能或增强其能力的研究仍然有限。本研究通过开发名为WaterER的评估套件(即1043项任务),初步评估了8个基础模型(即GPT-4、GPT-3.5、Gemini、GLM-4、ERNIE、QWEN、Llama3-8B和Llama3-70B)在工程和研究中广泛的水知识任务上的性能。GPT-4在工程和研究中的各种水知识任务中表现出色。Llama3-70B在中文工程查询中表现最好,而面向中文的模型在英文工程任务中表现优于GPT-3.5。Gemini展示了在废水处理、环境恢复、饮用水处理、卫生、厌氧消化和污染物方面的专业学术能力。为了进一步推进llm,我们采用了即时工程(即五次学习),并将开源的Llama3-8B微调成一个专门的模型,即WaterGPT。WaterGPT表现出更强的推理能力,在英语工程任务和研究任务上的表现分别超过Llama3-8B 135.4%和18.8%。此外,事实证明,微调比即时工程更可靠、更有效。总的来说,本研究建立了各种llm在水务部门的基线绩效,同时强调了强大的评估框架和增强技术,以确保llm的有效和可靠使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Environmental Science & Technology Letters Environ.
Environmental Science & Technology Letters Environ. ENGINEERING, ENVIRONMENTALENVIRONMENTAL SC-ENVIRONMENTAL SCIENCES
CiteScore
17.90
自引率
3.70%
发文量
163
期刊介绍: Environmental Science & Technology Letters serves as an international forum for brief communications on experimental or theoretical results of exceptional timeliness in all aspects of environmental science, both pure and applied. Published as soon as accepted, these communications are summarized in monthly issues. Additionally, the journal features short reviews on emerging topics in environmental science and technology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信