集成领域特定知识和精细调整的通用大型语言模型,用于建筑工程管理中的问答

IF 9.6 1区 工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY
Shenghua Zhou , Xuefan Liu , Dezhi Li , Tiantian Gu , Keyan Liu , Yifan Yang , Mun On Wong
{"title":"集成领域特定知识和精细调整的通用大型语言模型,用于建筑工程管理中的问答","authors":"Shenghua Zhou ,&nbsp;Xuefan Liu ,&nbsp;Dezhi Li ,&nbsp;Tiantian Gu ,&nbsp;Keyan Liu ,&nbsp;Yifan Yang ,&nbsp;Mun On Wong","doi":"10.1016/j.autcon.2025.106206","DOIUrl":null,"url":null,"abstract":"<div><div>General-purpose Large Language Models (GLLMs) for Question-Answering (QA) of Construction Engineering Management (CEM) usually lack CEM knowledge and fine-tuning datasets, leading to unsatisfactory performance. Hence, this paper integrates the CEM External Knowledge Base (CEM-EKB) with out-of-domain fine-tuned GLLMs for CEM-QA. It encompasses (i) devising a process to develop the CEM-EKB with 235 documents, (ii) conducting out-of-domain fine-tuning to enhance GLLMs' abilities, (iii) integrating CEM-EKB with fine-tuned GLLMs, (iv) building CEM-QA test datasets with 5050 Multiple-Choice Questions (MCQs) and 100 Case-Based Questions (CBQs), and (v) comparing GLLMs' performance. The results indicate that CEM knowledge-incorporated fine-tuned GLLMs surpass original GLLMs by an average of 27.1 % in professional examinations, with an average improvement of 27.5 % across 7 CEM subdomains and 22.05 % for CBQs. This paper contributes to devising an effective, reusable, and updatable CEM-EKB; revealing the feasibility of out-of-domain datasets for fine-tuning; and sharing a large-scale CEM-QA test dataset.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"175 ","pages":"Article 106206"},"PeriodicalIF":9.6000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrating domain-specific knowledge and fine-tuned general-purpose large language models for question-answering in construction engineering management\",\"authors\":\"Shenghua Zhou ,&nbsp;Xuefan Liu ,&nbsp;Dezhi Li ,&nbsp;Tiantian Gu ,&nbsp;Keyan Liu ,&nbsp;Yifan Yang ,&nbsp;Mun On Wong\",\"doi\":\"10.1016/j.autcon.2025.106206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>General-purpose Large Language Models (GLLMs) for Question-Answering (QA) of Construction Engineering Management (CEM) usually lack CEM knowledge and fine-tuning datasets, leading to unsatisfactory performance. Hence, this paper integrates the CEM External Knowledge Base (CEM-EKB) with out-of-domain fine-tuned GLLMs for CEM-QA. It encompasses (i) devising a process to develop the CEM-EKB with 235 documents, (ii) conducting out-of-domain fine-tuning to enhance GLLMs' abilities, (iii) integrating CEM-EKB with fine-tuned GLLMs, (iv) building CEM-QA test datasets with 5050 Multiple-Choice Questions (MCQs) and 100 Case-Based Questions (CBQs), and (v) comparing GLLMs' performance. The results indicate that CEM knowledge-incorporated fine-tuned GLLMs surpass original GLLMs by an average of 27.1 % in professional examinations, with an average improvement of 27.5 % across 7 CEM subdomains and 22.05 % for CBQs. This paper contributes to devising an effective, reusable, and updatable CEM-EKB; revealing the feasibility of out-of-domain datasets for fine-tuning; and sharing a large-scale CEM-QA test dataset.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"175 \",\"pages\":\"Article 106206\"},\"PeriodicalIF\":9.6000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580525002468\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525002468","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

用于建筑工程管理(CEM)问答(QA)的通用大型语言模型(GLLMs)通常缺乏CEM知识和微调数据集,导致性能不理想。因此,本文将CEM外部知识库(CEM- ekb)与CEM- qa的域外微调gllm集成在一起。它包括(i)设计一个包含235个文档的CEM-EKB的开发过程,(ii)进行域外微调以提高gllm的能力,(iii)将CEM-EKB与微调后的gllm集成,(iv)构建包含5050个选择题(mcq)和100个基于案例的问题(cbq)的CEM-QA测试数据集,以及(v)比较gllm的性能。结果表明,在专业考试中,包含CEM知识的微调gllm比原始gllm平均提高27.1%,在7个CEM子领域中平均提高27.5%,在cbq方面平均提高22.05%。本文有助于设计一个有效的、可重用的、可更新的CEM-EKB;揭示域外数据集用于微调的可行性;共享大规模CEM-QA测试数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integrating domain-specific knowledge and fine-tuned general-purpose large language models for question-answering in construction engineering management
General-purpose Large Language Models (GLLMs) for Question-Answering (QA) of Construction Engineering Management (CEM) usually lack CEM knowledge and fine-tuning datasets, leading to unsatisfactory performance. Hence, this paper integrates the CEM External Knowledge Base (CEM-EKB) with out-of-domain fine-tuned GLLMs for CEM-QA. It encompasses (i) devising a process to develop the CEM-EKB with 235 documents, (ii) conducting out-of-domain fine-tuning to enhance GLLMs' abilities, (iii) integrating CEM-EKB with fine-tuned GLLMs, (iv) building CEM-QA test datasets with 5050 Multiple-Choice Questions (MCQs) and 100 Case-Based Questions (CBQs), and (v) comparing GLLMs' performance. The results indicate that CEM knowledge-incorporated fine-tuned GLLMs surpass original GLLMs by an average of 27.1 % in professional examinations, with an average improvement of 27.5 % across 7 CEM subdomains and 22.05 % for CBQs. This paper contributes to devising an effective, reusable, and updatable CEM-EKB; revealing the feasibility of out-of-domain datasets for fine-tuning; and sharing a large-scale CEM-QA test dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Automation in Construction
Automation in Construction 工程技术-工程:土木
CiteScore
19.20
自引率
16.50%
发文量
563
审稿时长
8.5 months
期刊介绍: Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities. The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信