Chengzhang Chai , Yan Gao , Guanyu Xiong, Jiucai Liu, Haijiang Li
{"title":"Domain knowledge-driven image captioning for bridge damage description generation","authors":"Chengzhang Chai , Yan Gao , Guanyu Xiong, Jiucai Liu, Haijiang Li","doi":"10.1016/j.autcon.2025.106116","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning-based bridge visual inspection often produces limited outputs, lacking the accurate descriptions required for practical assessments. Researchers have explored multimodal approaches to generate damage descriptions, but existing models are prone to hallucination and face challenges related to feature representation sufficiency, attention mechanism flexibility, and domain-specific knowledge integration. This paper develops an image captioning framework driven by domain knowledge to address these issues. It incorporates a multi-level feature fusion module that adaptively integrates Faster R-CNN trained weights (domain knowledge) with a CNN architecture. Additionally, it introduces a correlation-aware attention mechanism to dynamically capture interdependencies between image regions and optimise the attentional focus during LSTM decoding. Experimental results show that the proposed framework achieves higher BLEU scores and improves image-text alignment as verified through attention heatmaps. While the framework enhances inspection efficiency and quality, further dataset expansion and broader domain validation are required to assess its generalisation ability.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"174 ","pages":"Article 106116"},"PeriodicalIF":9.6000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525001566","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning-based bridge visual inspection often produces limited outputs, lacking the accurate descriptions required for practical assessments. Researchers have explored multimodal approaches to generate damage descriptions, but existing models are prone to hallucination and face challenges related to feature representation sufficiency, attention mechanism flexibility, and domain-specific knowledge integration. This paper develops an image captioning framework driven by domain knowledge to address these issues. It incorporates a multi-level feature fusion module that adaptively integrates Faster R-CNN trained weights (domain knowledge) with a CNN architecture. Additionally, it introduces a correlation-aware attention mechanism to dynamically capture interdependencies between image regions and optimise the attentional focus during LSTM decoding. Experimental results show that the proposed framework achieves higher BLEU scores and improves image-text alignment as verified through attention heatmaps. While the framework enhances inspection efficiency and quality, further dataset expansion and broader domain validation are required to assess its generalisation ability.
期刊介绍:
Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities.
The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.