Large multimodal model assisted underground tunnel damage inspection and human-machine interaction

Journal of Infrastructure Intelligence and Resilience Pub Date : 2025-04-01 DOI:10.1016/j.iintel.2025.100154

Yanzhi Qi , Zhi Ding , Yaozhi Luo

{"title":"Large multimodal model assisted underground tunnel damage inspection and human-machine interaction","authors":"Yanzhi Qi , Zhi Ding , Yaozhi Luo","doi":"10.1016/j.iintel.2025.100154","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial Intelligence is playing an increasingly important role in tunnel inspection as a core driver of the new generation of engineering. Traditional methods are difficult to directly generate human linguistic information and lack valid messages extracted from different modalities. This paper proposes Damage LMM, a multimodal damage detection model that can handle images or videos as well as text inputs, to realize fast damage identification and human-computer interaction. The visual instruction database is first created from real damage data collected using different visual sensors and captions extracted by a regional convolutional neural network. The basic language model is then fine-tuned into a specialised Damage LMM, which enhances user instructions by integrating virtual prompt injection and system messages. Finally, the enhanced prompts are processed through the tuned multimodal model to generate a detailed visual description of the damage. The performance of the method is evaluated using a real tunnel dataset, and the results show that it has better robustness and accuracy than other models in multimodal data, with an accuracy of 0.93 for the in-domain image data and a contextual correlation of 0.94. The proposed method can effectively identify tunnel defects and realize multimodal user interaction functions with a moderate number of markers and a short delay time, which will greatly help engineers to quickly obtain effective information and assess the degree of damage at the tunnel inspection site.</div></div>","PeriodicalId":100791,"journal":{"name":"Journal of Infrastructure Intelligence and Resilience","volume":"4 3","pages":"Article 100154"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Infrastructure Intelligence and Resilience","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772991525000179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial Intelligence is playing an increasingly important role in tunnel inspection as a core driver of the new generation of engineering. Traditional methods are difficult to directly generate human linguistic information and lack valid messages extracted from different modalities. This paper proposes Damage LMM, a multimodal damage detection model that can handle images or videos as well as text inputs, to realize fast damage identification and human-computer interaction. The visual instruction database is first created from real damage data collected using different visual sensors and captions extracted by a regional convolutional neural network. The basic language model is then fine-tuned into a specialised Damage LMM, which enhances user instructions by integrating virtual prompt injection and system messages. Finally, the enhanced prompts are processed through the tuned multimodal model to generate a detailed visual description of the damage. The performance of the method is evaluated using a real tunnel dataset, and the results show that it has better robustness and accuracy than other models in multimodal data, with an accuracy of 0.93 for the in-domain image data and a contextual correlation of 0.94. The proposed method can effectively identify tunnel defects and realize multimodal user interaction functions with a moderate number of markers and a short delay time, which will greatly help engineers to quickly obtain effective information and assess the degree of damage at the tunnel inspection site.

查看原文本刊更多论文

大型多模态模型辅助地下隧道损伤检测和人机交互

人工智能作为新一代工程的核心驱动力，在隧道检测中发挥着越来越重要的作用。传统方法难以直接生成人类语言信息，缺乏从不同模态中提取的有效信息。为了实现快速的损伤识别和人机交互，本文提出了一种可以处理图像或视频以及文本输入的多模态损伤检测模型——损伤LMM。视觉指令数据库首先由不同视觉传感器收集的真实损伤数据和由区域卷积神经网络提取的字幕创建。然后将基本语言模型微调为专门的损害LMM，该LMM通过集成虚拟提示注入和系统消息来增强用户指令。最后，通过调整后的多模态模型对增强的提示进行处理，以生成损坏的详细视觉描述。使用真实隧道数据集对该方法进行了性能评估，结果表明，该方法在多模态数据中具有更好的鲁棒性和精度，对域内图像数据的精度为0.93，上下文相关性为0.94。该方法可以有效识别隧道缺陷，实现多模态用户交互功能，标记数量适中，延迟时间短，将极大地帮助工程师在隧道检测现场快速获取有效信息和评估损伤程度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Infrastructure Intelligence and Resilience

CiteScore

2.10

自引率

0.00%

发文量