Yongqing Jiang, Jianze Wang, Xinyi Shen, Kaoshan Dai
{"title":"Large language model for post‐earthquake structural damage assessment of buildings","authors":"Yongqing Jiang, Jianze Wang, Xinyi Shen, Kaoshan Dai","doi":"10.1111/mice.70010","DOIUrl":null,"url":null,"abstract":"A rapid and accurate assessment of structural damage to buildings in the aftermath of earthquakes is critical to emergency responses and engineering retrofit decisions. However, current in situ building damage assessment is primarily conducted through visual inspections by engineering professionals and deep learning techniques using single‐modal information, which are time‐consuming and unable to effectively integrate visual and textual information. In recent years, multimodal learning methods and large language models (LLMs), which could process visual and linguistic information, have emerged as viable alternatives for damage assessment of building constructions. In this study, a vision question–answering model for structural damage assessment (SDA‐Chat) is developed that automatically generates professional textual interpretations of structural damage images via multi‐round visual question–answering (VQA) interactions. A three‐stage training strategy that includes instruction fine‐tuning is designed to improve the model's VQA accuracy. The cross‐modality projector based on dimension reshaping and parallel network architecture is developed to enhance the accuracy and speed of alignment of multimodal features. Comparative experiments are conducted on the self‐constructed dataset containing 8195 pairs of structural damage images and corresponding damage description texts, focusing on various advanced LLMs. The results highlight that the SDA‐Chat can simultaneously process seven different tasks, demonstrating the effectiveness of the proposed method. The highest question–answering accuracy and efficiency of the model reached 83.04% and 435.31 tokens/s, respectively. In addition, high‐precision and lightweight solutions are designed for different application scenarios.","PeriodicalId":156,"journal":{"name":"Computer-Aided Civil and Infrastructure Engineering","volume":"46 1","pages":""},"PeriodicalIF":8.5000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer-Aided Civil and Infrastructure Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/mice.70010","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
A rapid and accurate assessment of structural damage to buildings in the aftermath of earthquakes is critical to emergency responses and engineering retrofit decisions. However, current in situ building damage assessment is primarily conducted through visual inspections by engineering professionals and deep learning techniques using single‐modal information, which are time‐consuming and unable to effectively integrate visual and textual information. In recent years, multimodal learning methods and large language models (LLMs), which could process visual and linguistic information, have emerged as viable alternatives for damage assessment of building constructions. In this study, a vision question–answering model for structural damage assessment (SDA‐Chat) is developed that automatically generates professional textual interpretations of structural damage images via multi‐round visual question–answering (VQA) interactions. A three‐stage training strategy that includes instruction fine‐tuning is designed to improve the model's VQA accuracy. The cross‐modality projector based on dimension reshaping and parallel network architecture is developed to enhance the accuracy and speed of alignment of multimodal features. Comparative experiments are conducted on the self‐constructed dataset containing 8195 pairs of structural damage images and corresponding damage description texts, focusing on various advanced LLMs. The results highlight that the SDA‐Chat can simultaneously process seven different tasks, demonstrating the effectiveness of the proposed method. The highest question–answering accuracy and efficiency of the model reached 83.04% and 435.31 tokens/s, respectively. In addition, high‐precision and lightweight solutions are designed for different application scenarios.
期刊介绍:
Computer-Aided Civil and Infrastructure Engineering stands as a scholarly, peer-reviewed archival journal, serving as a vital link between advancements in computer technology and civil and infrastructure engineering. The journal serves as a distinctive platform for the publication of original articles, spotlighting novel computational techniques and inventive applications of computers. Specifically, it concentrates on recent progress in computer and information technologies, fostering the development and application of emerging computing paradigms.
Encompassing a broad scope, the journal addresses bridge, construction, environmental, highway, geotechnical, structural, transportation, and water resources engineering. It extends its reach to the management of infrastructure systems, covering domains such as highways, bridges, pavements, airports, and utilities. The journal delves into areas like artificial intelligence, cognitive modeling, concurrent engineering, database management, distributed computing, evolutionary computing, fuzzy logic, genetic algorithms, geometric modeling, internet-based technologies, knowledge discovery and engineering, machine learning, mobile computing, multimedia technologies, networking, neural network computing, optimization and search, parallel processing, robotics, smart structures, software engineering, virtual reality, and visualization techniques.