Wei-Lun Tsai , Phuong-Linh Le , Wang-Fat Ho , Nai-Wen Chi , Jacob J. Lin , Shuai Tang , Shang-Hsien Hsieh
{"title":"利用对比语言-图像预培训(CLIP)图像字幕和注意力进行施工安全检查","authors":"Wei-Lun Tsai , Phuong-Linh Le , Wang-Fat Ho , Nai-Wen Chi , Jacob J. Lin , Shuai Tang , Shang-Hsien Hsieh","doi":"10.1016/j.autcon.2024.105863","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"169 ","pages":"Article 105863"},"PeriodicalIF":9.6000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention\",\"authors\":\"Wei-Lun Tsai , Phuong-Linh Le , Wang-Fat Ho , Nai-Wen Chi , Jacob J. Lin , Shuai Tang , Shang-Hsien Hsieh\",\"doi\":\"10.1016/j.autcon.2024.105863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":\"169 \",\"pages\":\"Article 105863\"},\"PeriodicalIF\":9.6000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580524005995\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580524005995","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention
Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.
期刊介绍:
Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities.
The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.