上下文感知建筑安全评估的大型视觉语言模型优化

IF 11.5 1区 工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY
Taegeon Kim , Seokhwan Kim , Wei-Chih Chern , Somin Park , Daeho Kim , Hongjo Kim
{"title":"上下文感知建筑安全评估的大型视觉语言模型优化","authors":"Taegeon Kim ,&nbsp;Seokhwan Kim ,&nbsp;Wei-Chih Chern ,&nbsp;Somin Park ,&nbsp;Daeho Kim ,&nbsp;Hongjo Kim","doi":"10.1016/j.autcon.2025.106510","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a context-aware large vision-language model (LVLM) for automated construction site safety assessment, addressing the limitations of existing models in domain-specific hazard recognition. It introduces a framework that combines domain-specific image-text data generation, vision encoder fine-tuning for improved object recognition, and Low-Rank Adaptation (LoRA)-based model adjustment for context-aware safety reasoning. The model was evaluated on 400 images from 10 hazardous situations, demonstrating superior performance in the image captioning task (average ROUGE-L: 0.3852, SPICE: 0.3615, SBERT-based similarity: 0.7484). For safety assessment, the fine-tuned model achieved 94.25 % accuracy in predicting safety status, significantly outperforming GPT-4 V (53.25 %) and LLaVA 1.5 (48 %). The quality of textual justifications was assessed using both GPT-4 V-based and expert-based evaluations of relevance and preference. In both settings, the fine-tuned model received the highest scores, demonstrating robust and context-aware safety reasoning. These findings confirm that domain-specific fine-tuning enhances safety classification and hazard interpretation, advancing construction site monitoring.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"180 ","pages":"Article 106510"},"PeriodicalIF":11.5000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing large vision-language models for context-aware construction safety assessment\",\"authors\":\"Taegeon Kim ,&nbsp;Seokhwan Kim ,&nbsp;Wei-Chih Chern ,&nbsp;Somin Park ,&nbsp;Daeho Kim ,&nbsp;Hongjo Kim\",\"doi\":\"10.1016/j.autcon.2025.106510\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents a context-aware large vision-language model (LVLM) for automated construction site safety assessment, addressing the limitations of existing models in domain-specific hazard recognition. It introduces a framework that combines domain-specific image-text data generation, vision encoder fine-tuning for improved object recognition, and Low-Rank Adaptation (LoRA)-based model adjustment for context-aware safety reasoning. The model was evaluated on 400 images from 10 hazardous situations, demonstrating superior performance in the image captioning task (average ROUGE-L: 0.3852, SPICE: 0.3615, SBERT-based similarity: 0.7484). For safety assessment, the fine-tuned model achieved 94.25 % accuracy in predicting safety status, significantly outperforming GPT-4 V (53.25 %) and LLaVA 1.5 (48 %). The quality of textual justifications was assessed using both GPT-4 V-based and expert-based evaluations of relevance and preference. In both settings, the fine-tuned model received the highest scores, demonstrating robust and context-aware safety reasoning. These findings confirm that domain-specific fine-tuning enhances safety classification and hazard interpretation, advancing construction site monitoring.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"180 \",\"pages\":\"Article 106510\"},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580525005503\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525005503","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种用于建筑工地安全自动化评估的情景感知大型视觉语言模型(LVLM),解决了现有模型在特定领域危险识别方面的局限性。它引入了一个框架,该框架结合了特定领域的图像文本数据生成、用于改进对象识别的视觉编码器微调以及用于上下文感知安全推理的基于低秩自适应(LoRA)的模型调整。该模型对来自10种危险情况的400幅图像进行了评估,在图像字幕任务中表现出优异的性能(平均ROUGE-L: 0.3852, SPICE: 0.3615,基于sbert的相似度:0.7484)。在安全评估方面,微调模型预测安全状态的准确率达到94.25%,显著优于GPT-4 V(53.25%)和LLaVA 1.5(48%)。使用基于GPT-4 v和基于专家的相关性和偏好评估来评估文本证明的质量。在这两种情况下,经过微调的模型获得了最高分,显示出鲁棒性和上下文感知的安全推理。这些发现证实,特定领域的微调增强了安全分类和危害解释,推进了施工现场监测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimizing large vision-language models for context-aware construction safety assessment
This paper presents a context-aware large vision-language model (LVLM) for automated construction site safety assessment, addressing the limitations of existing models in domain-specific hazard recognition. It introduces a framework that combines domain-specific image-text data generation, vision encoder fine-tuning for improved object recognition, and Low-Rank Adaptation (LoRA)-based model adjustment for context-aware safety reasoning. The model was evaluated on 400 images from 10 hazardous situations, demonstrating superior performance in the image captioning task (average ROUGE-L: 0.3852, SPICE: 0.3615, SBERT-based similarity: 0.7484). For safety assessment, the fine-tuned model achieved 94.25 % accuracy in predicting safety status, significantly outperforming GPT-4 V (53.25 %) and LLaVA 1.5 (48 %). The quality of textual justifications was assessed using both GPT-4 V-based and expert-based evaluations of relevance and preference. In both settings, the fine-tuned model received the highest scores, demonstrating robust and context-aware safety reasoning. These findings confirm that domain-specific fine-tuning enhances safety classification and hazard interpretation, advancing construction site monitoring.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Automation in Construction
Automation in Construction 工程技术-工程:土木
CiteScore
19.20
自引率
16.50%
发文量
563
审稿时长
8.5 months
期刊介绍: Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities. The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信