Using large language models for solving textbook-style thermodynamic problems

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Rébecca Loubet , Pascal Zittlau , Luisa Vollmer , Marco Hoffmann , Sophie Fellenz , Fabian Jirasek , Heike Leitte , Hans Hasse
{"title":"Using large language models for solving textbook-style thermodynamic problems","authors":"Rébecca Loubet ,&nbsp;Pascal Zittlau ,&nbsp;Luisa Vollmer ,&nbsp;Marco Hoffmann ,&nbsp;Sophie Fellenz ,&nbsp;Fabian Jirasek ,&nbsp;Heike Leitte ,&nbsp;Hans Hasse","doi":"10.1016/j.compchemeng.2025.109333","DOIUrl":null,"url":null,"abstract":"<div><div>Large Language Models (LLMs) have made significant progress in reasoning, demonstrating their capability to generate human-like responses. This study analyzes the problem-solving capabilities of LLMs in the domain of thermodynamics. A benchmark of 22 textbook-style thermodynamic problems to evaluate LLMs is presented that contains both simple and advanced problems. Five different LLMs are assessed: GPT-3.5, GPT-4, and GPT-4o from OpenAI, Llama 3.1 from Meta, and le Chat from MistralAI. The answers of these LLMs were evaluated by trained human experts, following a methodology akin to the grading of academic exam responses. The scores and the consistency of the answers are discussed, together with the analytical skills of the LLMs. Both strengths and weaknesses of the LLMs become evident. They generally yield good results for the simple problems, but also limitations become clear: The LLMs do not provide consistent results, they often fail to fully comprehend the context and make wrong assumptions. Given the complexity and domain-specific nature of the problems, the statistical language modeling approach of the LLMs struggles with the accurate interpretation and the required reasoning. The present results highlight the need for more systematic integration of thermodynamic knowledge with LLMs, for example, by using knowledge-based methods.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"204 ","pages":"Article 109333"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425003357","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Large Language Models (LLMs) have made significant progress in reasoning, demonstrating their capability to generate human-like responses. This study analyzes the problem-solving capabilities of LLMs in the domain of thermodynamics. A benchmark of 22 textbook-style thermodynamic problems to evaluate LLMs is presented that contains both simple and advanced problems. Five different LLMs are assessed: GPT-3.5, GPT-4, and GPT-4o from OpenAI, Llama 3.1 from Meta, and le Chat from MistralAI. The answers of these LLMs were evaluated by trained human experts, following a methodology akin to the grading of academic exam responses. The scores and the consistency of the answers are discussed, together with the analytical skills of the LLMs. Both strengths and weaknesses of the LLMs become evident. They generally yield good results for the simple problems, but also limitations become clear: The LLMs do not provide consistent results, they often fail to fully comprehend the context and make wrong assumptions. Given the complexity and domain-specific nature of the problems, the statistical language modeling approach of the LLMs struggles with the accurate interpretation and the required reasoning. The present results highlight the need for more systematic integration of thermodynamic knowledge with LLMs, for example, by using knowledge-based methods.

Abstract Image

使用大型语言模型来解决教科书式的热力学问题
大型语言模型(llm)在推理方面取得了重大进展,展示了它们产生类似人类反应的能力。本研究分析了热力学领域法学硕士解决问题的能力。提出了一个包含22个教科书式热力学问题的基准,以评估法学硕士,其中包括简单和高级问题。评估了五种不同的llm: OpenAI的GPT-3.5、GPT-4和gpt - 40, Meta的Llama 3.1和MistralAI的le Chat。这些法学硕士的答案由训练有素的人类专家评估,采用类似于学术考试评分的方法。讨论了分数和答案的一致性,以及法学硕士的分析技能。法学硕士的优势和劣势都变得显而易见。它们通常会对简单的问题产生好的结果,但也有明显的局限性:法学硕士不能提供一致的结果,它们经常不能完全理解上下文并做出错误的假设。考虑到问题的复杂性和特定领域的性质,法学硕士的统计语言建模方法与准确的解释和所需的推理作斗争。目前的结果强调需要更系统地整合热力学知识与法学硕士,例如,通过使用基于知识的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信