利用对比语言-图像预培训(CLIP)图像字幕和注意力进行施工安全检查

IF 9.6 1区 工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY
Wei-Lun Tsai , Phuong-Linh Le , Wang-Fat Ho , Nai-Wen Chi , Jacob J. Lin , Shuai Tang , Shang-Hsien Hsieh
{"title":"利用对比语言-图像预培训(CLIP)图像字幕和注意力进行施工安全检查","authors":"Wei-Lun Tsai ,&nbsp;Phuong-Linh Le ,&nbsp;Wang-Fat Ho ,&nbsp;Nai-Wen Chi ,&nbsp;Jacob J. Lin ,&nbsp;Shuai Tang ,&nbsp;Shang-Hsien Hsieh","doi":"10.1016/j.autcon.2024.105863","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"169 ","pages":"Article 105863"},"PeriodicalIF":9.6000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention\",\"authors\":\"Wei-Lun Tsai ,&nbsp;Phuong-Linh Le ,&nbsp;Wang-Fat Ho ,&nbsp;Nai-Wen Chi ,&nbsp;Jacob J. Lin ,&nbsp;Shuai Tang ,&nbsp;Shang-Hsien Hsieh\",\"doi\":\"10.1016/j.autcon.2024.105863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"169 \",\"pages\":\"Article 105863\"},\"PeriodicalIF\":9.6000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580524005995\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580524005995","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

传统的安全检查需要花费大量人力和时间来采集现场照片和文字描述。虽然标准化表格和图像字幕技术已被用于提高检查效率,但由于安全相关知识的多样性,汇编包含视觉和文本数据的报告仍具有挑战性。为了帮助检查人员更有效地评估违规行为,本文提出了一种图像语言模型,利用对比语言-图像预训练(CLIP)微调和前缀字幕自动生成安全观察结果。为简化现场工程师的安全报告文档,还创建了一个用户友好型手机应用程序。语言模型成功地对九种违规类型进行了分类,平均准确率为 73.7%,比基线模型高出 41.8%。实验参与者证实,该移动应用程序有助于安全检查。该自动化框架通过图像识别违章场景,简化了安全记录,提高了整体安全性能,并支持建筑工地的数字化转型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention
Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Automation in Construction
Automation in Construction 工程技术-工程:土木
CiteScore
19.20
自引率
16.50%
发文量
563
审稿时长
8.5 months
期刊介绍: Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities. The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信