Visual Question Answering-based Referring Expression Segmentation for construction safety analysis

IF 9.6 1区工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY

Automation in Construction Pub Date : 2025-03-26 DOI:10.1016/j.autcon.2025.106127

Dai Quoc Tran , Armstrong Aboah , Yuntae Jeon , Minh-Truyen Do , Mohamed Abdel-Aty , Minsoo Park , Seunghee Park

{"title":"Visual Question Answering-based Referring Expression Segmentation for construction safety analysis","authors":"Dai Quoc Tran , Armstrong Aboah , Yuntae Jeon , Minh-Truyen Do , Mohamed Abdel-Aty , Minsoo Park , Seunghee Park","doi":"10.1016/j.autcon.2025.106127","DOIUrl":null,"url":null,"abstract":"<div><div>Despite advancements in computer vision techniques like object detection and segmentation, a significant gap remains in leveraging these technologies for hazard recognition through natural language processing. To address this gap, this paper proposes VQA-RESCon, an approach that combines Visual Question Answering (VQA) and Referring Expression Segmentation (RES) to enhance construction safety analysis. By leveraging the visual grounding capabilities of RES, our method not only identifies potential hazards through VQA but also precisely localizes and highlights these hazards within the image. The method utilizes a large “scenario-questions” dataset comprising 200,000 images and 16 targeted questions to train a vision-and-language transformer model. In addition, post-processing techniques were employed using the ClipSeg and Segment Anything Model. The validation results indicate that both the VQA and RES models demonstrate notable reliability and precision. The VQA model achieves an F1 score surpassing 90%, while the segmentation models achieve a Mean Intersection over Union of 57%.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"174 ","pages":"Article 106127"},"PeriodicalIF":9.6000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525001670","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Despite advancements in computer vision techniques like object detection and segmentation, a significant gap remains in leveraging these technologies for hazard recognition through natural language processing. To address this gap, this paper proposes VQA-RESCon, an approach that combines Visual Question Answering (VQA) and Referring Expression Segmentation (RES) to enhance construction safety analysis. By leveraging the visual grounding capabilities of RES, our method not only identifies potential hazards through VQA but also precisely localizes and highlights these hazards within the image. The method utilizes a large “scenario-questions” dataset comprising 200,000 images and 16 targeted questions to train a vision-and-language transformer model. In addition, post-processing techniques were employed using the ClipSeg and Segment Anything Model. The validation results indicate that both the VQA and RES models demonstrate notable reliability and precision. The VQA model achieves an F1 score surpassing 90%, while the segmentation models achieve a Mean Intersection over Union of 57%.

查看原文本刊更多论文

基于视觉问答的建筑安全分析参考表达式分割

尽管物体检测和分割等计算机视觉技术取得了进步，但在利用这些技术通过自然语言处理进行危险识别方面仍然存在重大差距。为了解决这一问题，本文提出了VQA- rescon，一种结合视觉问答（VQA）和参考表达式分割（RES）的方法来增强建筑安全分析。通过利用RES的视觉基础功能，我们的方法不仅可以通过VQA识别潜在的危险，还可以在图像中精确地定位和突出这些危险。该方法利用包含20万张图像和16个目标问题的大型“场景问题”数据集来训练视觉和语言转换模型。此外，使用ClipSeg和Segment Anything模型采用了后处理技术。验证结果表明，VQA和RES模型均具有显著的可靠性和精度。VQA模型的F1得分超过90%，而分割模型的平均交集超过并集的得分为57%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automation in Construction 工程技术-工程：土木

CiteScore

19.20

自引率

16.50%

发文量

563

审稿时长

8.5 months

期刊介绍： Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities. The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.