Enabling GPTs for Expert-Level Environmental Engineering Question Answering

IF 8.9 2区 环境科学与生态学 Q1 ENGINEERING, ENVIRONMENTAL
Jun-Jie Zhu, Meiqi Yang, Jinyue Jiang, Yiming Bai, Danqi Chen and Zhiyong Jason Ren*, 
{"title":"Enabling GPTs for Expert-Level Environmental Engineering Question Answering","authors":"Jun-Jie Zhu,&nbsp;Meiqi Yang,&nbsp;Jinyue Jiang,&nbsp;Yiming Bai,&nbsp;Danqi Chen and Zhiyong Jason Ren*,&nbsp;","doi":"10.1021/acs.estlett.4c0066510.1021/acs.estlett.4c00665","DOIUrl":null,"url":null,"abstract":"<p >Artificial intelligence (AI) holds significant potential for advancing research and development in the field of environmental science and engineering (ESE), but the development of domain-specific large language models (LLMs) in this field has not been reported. This study addresses this gap by evaluating the performance of advanced LLMs in answering expert-level, closed-book environmental engineering questions. We assessed two generative pretrained transformer (GPT) models and five fine-tuned models (FTMs) on an expert-level question answering (QA) data set, focusing on relevance (from 0 to 1), factuality (0 to 1), format, richness, QA difficulty level, and domain topic. Results show that GPT-4 achieves a relevance score of 0.644 and a factuality score of 0.791 based on 286 questions, indicating room for improvement, particularly for more difficult questions (scores dropped to below 0.5). Notably, FTMs with larger data sets resisted factuality degradation, highlighting the need for high-quality training materials. Inaccuracies and format issues are often linked to overtraining and catastrophic interference. This first investigation leverages expert-level textbooks to enhance LLM performance, thereby providing valuable insights and setting the stage for developing more robust domain-specific LLMs for environmental applications.</p>","PeriodicalId":37,"journal":{"name":"Environmental Science & Technology Letters Environ.","volume":"11 12","pages":"1327–1333 1327–1333"},"PeriodicalIF":8.9000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science & Technology Letters Environ.","FirstCategoryId":"1","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.estlett.4c00665","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence (AI) holds significant potential for advancing research and development in the field of environmental science and engineering (ESE), but the development of domain-specific large language models (LLMs) in this field has not been reported. This study addresses this gap by evaluating the performance of advanced LLMs in answering expert-level, closed-book environmental engineering questions. We assessed two generative pretrained transformer (GPT) models and five fine-tuned models (FTMs) on an expert-level question answering (QA) data set, focusing on relevance (from 0 to 1), factuality (0 to 1), format, richness, QA difficulty level, and domain topic. Results show that GPT-4 achieves a relevance score of 0.644 and a factuality score of 0.791 based on 286 questions, indicating room for improvement, particularly for more difficult questions (scores dropped to below 0.5). Notably, FTMs with larger data sets resisted factuality degradation, highlighting the need for high-quality training materials. Inaccuracies and format issues are often linked to overtraining and catastrophic interference. This first investigation leverages expert-level textbooks to enhance LLM performance, thereby providing valuable insights and setting the stage for developing more robust domain-specific LLMs for environmental applications.

Abstract Image

启用gpt专家级环境工程问题回答
人工智能(AI)在推进环境科学与工程(ESE)领域的研究和发展方面具有巨大的潜力,但该领域特定领域的大型语言模型(llm)的发展尚未报道。本研究通过评估高级法学硕士在回答专家级闭卷环境工程问题方面的表现来解决这一差距。我们评估了专家级问答(QA)数据集上的两个生成式预训练转换(GPT)模型和五个微调模型(ftm),重点关注相关性(从0到1)、事实性(0到1)、格式、丰富度、QA难度等级和领域主题。结果显示,GPT-4在286道题的基础上获得了0.644分的相关性分数和0.791分的事实性分数,显示出改进的空间,特别是在较难的问题上(得分降至0.5以下)。值得注意的是,具有更大数据集的ftm抵制了真实性的退化,突出了对高质量培训材料的需求。不准确和格式问题通常与过度训练和灾难性干扰有关。第一次调查利用专家级教科书来提高LLM的性能,从而提供有价值的见解,并为为环境应用开发更强大的特定领域LLM奠定基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Environmental Science & Technology Letters Environ.
Environmental Science & Technology Letters Environ. ENGINEERING, ENVIRONMENTALENVIRONMENTAL SC-ENVIRONMENTAL SCIENCES
CiteScore
17.90
自引率
3.70%
发文量
163
期刊介绍: Environmental Science & Technology Letters serves as an international forum for brief communications on experimental or theoretical results of exceptional timeliness in all aspects of environmental science, both pure and applied. Published as soon as accepted, these communications are summarized in monthly issues. Additionally, the journal features short reviews on emerging topics in environmental science and technology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信