Koi Xiaowen Guo , Peter Kok-Yiu Wong , Jack C.P. Cheng , Chak-Fu Chan , Pak-Him Leung , Xingyu Tao
{"title":"通过快速工程和双阶段检索增强生成,增强建筑工地安全合规的可视化llm","authors":"Koi Xiaowen Guo , Peter Kok-Yiu Wong , Jack C.P. Cheng , Chak-Fu Chan , Pak-Him Leung , Xingyu Tao","doi":"10.1016/j.autcon.2025.106490","DOIUrl":null,"url":null,"abstract":"<div><div>The escalating frequency of safety incidents on construction sites requires an effective safety management framework. Traditional computer vision systems are constrained by their static nature, limited generalization, and inadequate semantic comprehension. This paper integrates a multi-modal Visual Language Model (VLM) with our proposed Bi-stage Retrieval-Augmented Generation (BiRAG) framework, which enhances safety compliance monitoring based on construction site images, with high scalability and adaptiveness to evolving safety standards without tedious model fine-tuning. A TriPhased prompt (TPP) and a decision-tree-based compliance judgment prompt are designed to enhance the VLM's ability to interpret worker behaviors and safety compliance from site images. A context-aware chunking strategy and hybrid retrieval algorithm are developed to improve the analysis against relevant safety regulations. Experiments with images collected from a real construction site in Hong Kong demonstrated a 7.73 % increase in retrieval accuracy and an 11.66 % improvement in compliance analysis accuracy, offering a holistic construction safety management solution.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"179 ","pages":"Article 106490"},"PeriodicalIF":11.5000,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing visual-LLM for construction site safety compliance via prompt engineering and Bi-stage retrieval-augmented generation\",\"authors\":\"Koi Xiaowen Guo , Peter Kok-Yiu Wong , Jack C.P. Cheng , Chak-Fu Chan , Pak-Him Leung , Xingyu Tao\",\"doi\":\"10.1016/j.autcon.2025.106490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The escalating frequency of safety incidents on construction sites requires an effective safety management framework. Traditional computer vision systems are constrained by their static nature, limited generalization, and inadequate semantic comprehension. This paper integrates a multi-modal Visual Language Model (VLM) with our proposed Bi-stage Retrieval-Augmented Generation (BiRAG) framework, which enhances safety compliance monitoring based on construction site images, with high scalability and adaptiveness to evolving safety standards without tedious model fine-tuning. A TriPhased prompt (TPP) and a decision-tree-based compliance judgment prompt are designed to enhance the VLM's ability to interpret worker behaviors and safety compliance from site images. A context-aware chunking strategy and hybrid retrieval algorithm are developed to improve the analysis against relevant safety regulations. Experiments with images collected from a real construction site in Hong Kong demonstrated a 7.73 % increase in retrieval accuracy and an 11.66 % improvement in compliance analysis accuracy, offering a holistic construction safety management solution.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"179 \",\"pages\":\"Article 106490\"},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2025-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580525005308\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525005308","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
Enhancing visual-LLM for construction site safety compliance via prompt engineering and Bi-stage retrieval-augmented generation
The escalating frequency of safety incidents on construction sites requires an effective safety management framework. Traditional computer vision systems are constrained by their static nature, limited generalization, and inadequate semantic comprehension. This paper integrates a multi-modal Visual Language Model (VLM) with our proposed Bi-stage Retrieval-Augmented Generation (BiRAG) framework, which enhances safety compliance monitoring based on construction site images, with high scalability and adaptiveness to evolving safety standards without tedious model fine-tuning. A TriPhased prompt (TPP) and a decision-tree-based compliance judgment prompt are designed to enhance the VLM's ability to interpret worker behaviors and safety compliance from site images. A context-aware chunking strategy and hybrid retrieval algorithm are developed to improve the analysis against relevant safety regulations. Experiments with images collected from a real construction site in Hong Kong demonstrated a 7.73 % increase in retrieval accuracy and an 11.66 % improvement in compliance analysis accuracy, offering a holistic construction safety management solution.
期刊介绍:
Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities.
The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.