基于领域本体的建筑工地安全监测环境感知视觉语言模型智能体

IF 11.5 1区 工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY
Chak-Fu Chan , Peter Kok-Yiu Wong , Xiaowen Guo , Jack C.P. Cheng , Jolly Pui-Ching Chan , Pak-Him Leung , Xingyu Tao
{"title":"基于领域本体的建筑工地安全监测环境感知视觉语言模型智能体","authors":"Chak-Fu Chan ,&nbsp;Peter Kok-Yiu Wong ,&nbsp;Xiaowen Guo ,&nbsp;Jack C.P. Cheng ,&nbsp;Jolly Pui-Ching Chan ,&nbsp;Pak-Him Leung ,&nbsp;Xingyu Tao","doi":"10.1016/j.autcon.2025.106305","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"177 ","pages":"Article 106305"},"PeriodicalIF":11.5000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Context-aware vision-language model agent enriched with domain-specific ontology for construction site safety monitoring\",\"authors\":\"Chak-Fu Chan ,&nbsp;Peter Kok-Yiu Wong ,&nbsp;Xiaowen Guo ,&nbsp;Jack C.P. Cheng ,&nbsp;Jolly Pui-Ching Chan ,&nbsp;Pak-Him Leung ,&nbsp;Xingyu Tao\",\"doi\":\"10.1016/j.autcon.2025.106305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"177 \",\"pages\":\"Article 106305\"},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580525003450\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525003450","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

传统的施工现场安全监控方式严重依赖人工现场检查,容易出现被忽视的事故。现有的计算机视觉方法需要耗费大量时间和逐案标注数据,并且缺乏高层次的推理能力。本文通过将多模态视觉语言模型集成到视频分析中,开发了一种类人虚拟助理智能体:(1)为了高效生成用于模型开发的图像文本数据,设计了一种基于上下文学习的半自动图像文本标记管道;(2)为了优化虚拟智能体从预训练到领域定制,设计了一种两阶段课程学习范式,以提高模型对特定领域任务的微调效果;(3)为了更有效地将建筑领域的知识注入到虚拟智能体中,开发了一个由建筑安全本体驱动的分层提示框架,以获得更适合领域的推理能力。该虚拟代理已经部署在一个真实的建筑工地进行实时视频分析,在识别违反高空作业安全法规方面的准确率超过90%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Context-aware vision-language model agent enriched with domain-specific ontology for construction site safety monitoring
Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Automation in Construction
Automation in Construction 工程技术-工程:土木
CiteScore
19.20
自引率
16.50%
发文量
563
审稿时长
8.5 months
期刊介绍: Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities. The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信