基于领域本体的建筑工地安全监测环境感知视觉语言模型智能体

IF 11.5 1区工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY

Automation in Construction Pub Date : 2025-06-02 DOI:10.1016/j.autcon.2025.106305

Chak-Fu Chan , Peter Kok-Yiu Wong , Xiaowen Guo , Jack C.P. Cheng , Jolly Pui-Ching Chan , Pak-Him Leung , Xingyu Tao

{"title":"基于领域本体的建筑工地安全监测环境感知视觉语言模型智能体","authors":"Chak-Fu Chan , Peter Kok-Yiu Wong , Xiaowen Guo , Jack C.P. Cheng , Jolly Pui-Ching Chan , Pak-Him Leung , Xingyu Tao","doi":"10.1016/j.autcon.2025.106305","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"177 ","pages":"Article 106305"},"PeriodicalIF":11.5000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Context-aware vision-language model agent enriched with domain-specific ontology for construction site safety monitoring\",\"authors\":\"Chak-Fu Chan , Peter Kok-Yiu Wong , Xiaowen Guo , Jack C.P. Cheng , Jolly Pui-Ching Chan , Pak-Him Leung , Xingyu Tao\",\"doi\":\"10.1016/j.autcon.2025.106305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"177 \",\"pages\":\"Article 106305\"},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580525003450\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525003450","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

传统的施工现场安全监控方式严重依赖人工现场检查，容易出现被忽视的事故。现有的计算机视觉方法需要耗费大量时间和逐案标注数据，并且缺乏高层次的推理能力。本文通过将多模态视觉语言模型集成到视频分析中，开发了一种类人虚拟助理智能体：(1)为了高效生成用于模型开发的图像文本数据，设计了一种基于上下文学习的半自动图像文本标记管道；(2)为了优化虚拟智能体从预训练到领域定制，设计了一种两阶段课程学习范式，以提高模型对特定领域任务的微调效果；(3)为了更有效地将建筑领域的知识注入到虚拟智能体中，开发了一个由建筑安全本体驱动的分层提示框架，以获得更适合领域的推理能力。该虚拟代理已经部署在一个真实的建筑工地进行实时视频分析，在识别违反高空作业安全法规方面的准确率超过90%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Context-aware vision-language model agent enriched with domain-specific ontology for construction site safety monitoring

Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Automation in Construction 工程技术-工程：土木

CiteScore

19.20

自引率

16.50%

发文量

563

审稿时长

8.5 months

期刊介绍： Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities. The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.