{"title":"Large multimodal model assisted underground tunnel damage inspection and human-machine interaction","authors":"Yanzhi Qi , Zhi Ding , Yaozhi Luo","doi":"10.1016/j.iintel.2025.100154","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial Intelligence is playing an increasingly important role in tunnel inspection as a core driver of the new generation of engineering. Traditional methods are difficult to directly generate human linguistic information and lack valid messages extracted from different modalities. This paper proposes Damage LMM, a multimodal damage detection model that can handle images or videos as well as text inputs, to realize fast damage identification and human-computer interaction. The visual instruction database is first created from real damage data collected using different visual sensors and captions extracted by a regional convolutional neural network. The basic language model is then fine-tuned into a specialised Damage LMM, which enhances user instructions by integrating virtual prompt injection and system messages. Finally, the enhanced prompts are processed through the tuned multimodal model to generate a detailed visual description of the damage. The performance of the method is evaluated using a real tunnel dataset, and the results show that it has better robustness and accuracy than other models in multimodal data, with an accuracy of 0.93 for the in-domain image data and a contextual correlation of 0.94. The proposed method can effectively identify tunnel defects and realize multimodal user interaction functions with a moderate number of markers and a short delay time, which will greatly help engineers to quickly obtain effective information and assess the degree of damage at the tunnel inspection site.</div></div>","PeriodicalId":100791,"journal":{"name":"Journal of Infrastructure Intelligence and Resilience","volume":"4 3","pages":"Article 100154"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Infrastructure Intelligence and Resilience","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772991525000179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial Intelligence is playing an increasingly important role in tunnel inspection as a core driver of the new generation of engineering. Traditional methods are difficult to directly generate human linguistic information and lack valid messages extracted from different modalities. This paper proposes Damage LMM, a multimodal damage detection model that can handle images or videos as well as text inputs, to realize fast damage identification and human-computer interaction. The visual instruction database is first created from real damage data collected using different visual sensors and captions extracted by a regional convolutional neural network. The basic language model is then fine-tuned into a specialised Damage LMM, which enhances user instructions by integrating virtual prompt injection and system messages. Finally, the enhanced prompts are processed through the tuned multimodal model to generate a detailed visual description of the damage. The performance of the method is evaluated using a real tunnel dataset, and the results show that it has better robustness and accuracy than other models in multimodal data, with an accuracy of 0.93 for the in-domain image data and a contextual correlation of 0.94. The proposed method can effectively identify tunnel defects and realize multimodal user interaction functions with a moderate number of markers and a short delay time, which will greatly help engineers to quickly obtain effective information and assess the degree of damage at the tunnel inspection site.