通过快速工程和双阶段检索增强生成,增强建筑工地安全合规的可视化llm

IF 11.5 1区 工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY
Koi Xiaowen Guo , Peter Kok-Yiu Wong , Jack C.P. Cheng , Chak-Fu Chan , Pak-Him Leung , Xingyu Tao
{"title":"通过快速工程和双阶段检索增强生成,增强建筑工地安全合规的可视化llm","authors":"Koi Xiaowen Guo ,&nbsp;Peter Kok-Yiu Wong ,&nbsp;Jack C.P. Cheng ,&nbsp;Chak-Fu Chan ,&nbsp;Pak-Him Leung ,&nbsp;Xingyu Tao","doi":"10.1016/j.autcon.2025.106490","DOIUrl":null,"url":null,"abstract":"<div><div>The escalating frequency of safety incidents on construction sites requires an effective safety management framework. Traditional computer vision systems are constrained by their static nature, limited generalization, and inadequate semantic comprehension. This paper integrates a multi-modal Visual Language Model (VLM) with our proposed Bi-stage Retrieval-Augmented Generation (BiRAG) framework, which enhances safety compliance monitoring based on construction site images, with high scalability and adaptiveness to evolving safety standards without tedious model fine-tuning. A TriPhased prompt (TPP) and a decision-tree-based compliance judgment prompt are designed to enhance the VLM's ability to interpret worker behaviors and safety compliance from site images. A context-aware chunking strategy and hybrid retrieval algorithm are developed to improve the analysis against relevant safety regulations. Experiments with images collected from a real construction site in Hong Kong demonstrated a 7.73 % increase in retrieval accuracy and an 11.66 % improvement in compliance analysis accuracy, offering a holistic construction safety management solution.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"179 ","pages":"Article 106490"},"PeriodicalIF":11.5000,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing visual-LLM for construction site safety compliance via prompt engineering and Bi-stage retrieval-augmented generation\",\"authors\":\"Koi Xiaowen Guo ,&nbsp;Peter Kok-Yiu Wong ,&nbsp;Jack C.P. Cheng ,&nbsp;Chak-Fu Chan ,&nbsp;Pak-Him Leung ,&nbsp;Xingyu Tao\",\"doi\":\"10.1016/j.autcon.2025.106490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The escalating frequency of safety incidents on construction sites requires an effective safety management framework. Traditional computer vision systems are constrained by their static nature, limited generalization, and inadequate semantic comprehension. This paper integrates a multi-modal Visual Language Model (VLM) with our proposed Bi-stage Retrieval-Augmented Generation (BiRAG) framework, which enhances safety compliance monitoring based on construction site images, with high scalability and adaptiveness to evolving safety standards without tedious model fine-tuning. A TriPhased prompt (TPP) and a decision-tree-based compliance judgment prompt are designed to enhance the VLM's ability to interpret worker behaviors and safety compliance from site images. A context-aware chunking strategy and hybrid retrieval algorithm are developed to improve the analysis against relevant safety regulations. Experiments with images collected from a real construction site in Hong Kong demonstrated a 7.73 % increase in retrieval accuracy and an 11.66 % improvement in compliance analysis accuracy, offering a holistic construction safety management solution.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"179 \",\"pages\":\"Article 106490\"},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2025-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580525005308\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525005308","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

建筑工地安全事故频发,需要一个有效的安全管理框架。传统的计算机视觉系统受到静态、泛化和语义理解不足的限制。本文将多模态视觉语言模型(VLM)与我们提出的双阶段检索增强生成(BiRAG)框架集成在一起,该框架增强了基于建筑工地图像的安全符合性监测,具有高可扩展性和对不断发展的安全标准的适应性,无需繁琐的模型精细调整。三阶段提示(TPP)和基于决策树的符合性判断提示旨在增强VLM从现场图像中解释工人行为和安全符合性的能力。提出了一种上下文感知的分块策略和混合检索算法,以提高对相关安全法规的分析能力。在香港的真实建筑工地进行的实验显示,检索准确度提高了7.73 %,合规分析准确度提高了11.66 %,提供了一个整体的建筑安全管理解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing visual-LLM for construction site safety compliance via prompt engineering and Bi-stage retrieval-augmented generation
The escalating frequency of safety incidents on construction sites requires an effective safety management framework. Traditional computer vision systems are constrained by their static nature, limited generalization, and inadequate semantic comprehension. This paper integrates a multi-modal Visual Language Model (VLM) with our proposed Bi-stage Retrieval-Augmented Generation (BiRAG) framework, which enhances safety compliance monitoring based on construction site images, with high scalability and adaptiveness to evolving safety standards without tedious model fine-tuning. A TriPhased prompt (TPP) and a decision-tree-based compliance judgment prompt are designed to enhance the VLM's ability to interpret worker behaviors and safety compliance from site images. A context-aware chunking strategy and hybrid retrieval algorithm are developed to improve the analysis against relevant safety regulations. Experiments with images collected from a real construction site in Hong Kong demonstrated a 7.73 % increase in retrieval accuracy and an 11.66 % improvement in compliance analysis accuracy, offering a holistic construction safety management solution.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Automation in Construction
Automation in Construction 工程技术-工程:土木
CiteScore
19.20
自引率
16.50%
发文量
563
审稿时长
8.5 months
期刊介绍: Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities. The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信