{"title":"Transformer-based large vision model for universal structural damage segmentation","authors":"Yang Xu , Chuao Zhang , Hui Li","doi":"10.1016/j.autcon.2025.106256","DOIUrl":null,"url":null,"abstract":"<div><div>Current structural damage segmentation models are often trained based on substantial pixel-level labels for specific structural components and damage types. To address this issue, this paper establishes a transformer-based large vision model for universal structural damage segmentation, incorporating a pre-trained transformer-based frozen backbone and a fine-tuned CNN-based segmentation head. A synthetic loss function of correlation loss and contrastive loss is proposed. A self-supervised correlation learning procedure is designed to ensure cross-level feature alignment. The contrastive loss across student-teacher networks is designed to learn intra-instance similarity and inter-instance separability. A contrastive learning strategy is employed to fine-tune the segmentation head by exponential moving average with momentum updating. The proposed method is validated on a multi-scale image dataset for cable-supported bridges, concrete bridges, and post-earthquake buildings. The recognition accuracy, generalization ability, robustness under complex background, and superiority to conventional supervised and unsupervised segmentation models are demonstrated.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"176 ","pages":"Article 106256"},"PeriodicalIF":9.6000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525002961","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Current structural damage segmentation models are often trained based on substantial pixel-level labels for specific structural components and damage types. To address this issue, this paper establishes a transformer-based large vision model for universal structural damage segmentation, incorporating a pre-trained transformer-based frozen backbone and a fine-tuned CNN-based segmentation head. A synthetic loss function of correlation loss and contrastive loss is proposed. A self-supervised correlation learning procedure is designed to ensure cross-level feature alignment. The contrastive loss across student-teacher networks is designed to learn intra-instance similarity and inter-instance separability. A contrastive learning strategy is employed to fine-tune the segmentation head by exponential moving average with momentum updating. The proposed method is validated on a multi-scale image dataset for cable-supported bridges, concrete bridges, and post-earthquake buildings. The recognition accuracy, generalization ability, robustness under complex background, and superiority to conventional supervised and unsupervised segmentation models are demonstrated.
期刊介绍:
Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities.
The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.