利用对比语言-图像预培训（CLIP）图像字幕和注意力进行施工安全检查

IF 11.5 1区工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY

Automation in Construction Pub Date : 2024-11-22 DOI:10.1016/j.autcon.2024.105863

Wei-Lun Tsai , Phuong-Linh Le , Wang-Fat Ho , Nai-Wen Chi , Jacob J. Lin , Shuai Tang , Shang-Hsien Hsieh

{"title":"利用对比语言-图像预培训（CLIP）图像字幕和注意力进行施工安全检查","authors":"Wei-Lun Tsai , Phuong-Linh Le , Wang-Fat Ho , Nai-Wen Chi , Jacob J. Lin , Shuai Tang , Shang-Hsien Hsieh","doi":"10.1016/j.autcon.2024.105863","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"169 ","pages":"Article 105863"},"PeriodicalIF":11.5000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention\",\"authors\":\"Wei-Lun Tsai , Phuong-Linh Le , Wang-Fat Ho , Nai-Wen Chi , Jacob J. Lin , Shuai Tang , Shang-Hsien Hsieh\",\"doi\":\"10.1016/j.autcon.2024.105863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"169 \",\"pages\":\"Article 105863\"},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580524005995\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580524005995","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

传统的安全检查需要花费大量人力和时间来采集现场照片和文字描述。虽然标准化表格和图像字幕技术已被用于提高检查效率，但由于安全相关知识的多样性，汇编包含视觉和文本数据的报告仍具有挑战性。为了帮助检查人员更有效地评估违规行为，本文提出了一种图像语言模型，利用对比语言-图像预训练（CLIP）微调和前缀字幕自动生成安全观察结果。为简化现场工程师的安全报告文档，还创建了一个用户友好型手机应用程序。语言模型成功地对九种违规类型进行了分类，平均准确率为 73.7%，比基线模型高出 41.8%。实验参与者证实，该移动应用程序有助于安全检查。该自动化框架通过图像识别违章场景，简化了安全记录，提高了整体安全性能，并支持建筑工地的数字化转型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention

Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Automation in Construction 工程技术-工程：土木

CiteScore

19.20

自引率

16.50%

发文量

563

审稿时长

8.5 months

期刊介绍： Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities. The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.