Chak-Fu Chan , Peter Kok-Yiu Wong , Xiaowen Guo , Jack C.P. Cheng , Jolly Pui-Ching Chan , Pak-Him Leung , Xingyu Tao
{"title":"基于领域本体的建筑工地安全监测环境感知视觉语言模型智能体","authors":"Chak-Fu Chan , Peter Kok-Yiu Wong , Xiaowen Guo , Jack C.P. Cheng , Jolly Pui-Ching Chan , Pak-Him Leung , Xingyu Tao","doi":"10.1016/j.autcon.2025.106305","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"177 ","pages":"Article 106305"},"PeriodicalIF":11.5000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Context-aware vision-language model agent enriched with domain-specific ontology for construction site safety monitoring\",\"authors\":\"Chak-Fu Chan , Peter Kok-Yiu Wong , Xiaowen Guo , Jack C.P. Cheng , Jolly Pui-Ching Chan , Pak-Him Leung , Xingyu Tao\",\"doi\":\"10.1016/j.autcon.2025.106305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"177 \",\"pages\":\"Article 106305\"},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580525003450\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525003450","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
Context-aware vision-language model agent enriched with domain-specific ontology for construction site safety monitoring
Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.
期刊介绍:
Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities.
The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.