基于多模态深度融合的桥梁状态评估

Journal of Infrastructure Intelligence and Resilience Pub Date : 2023-10-02 DOI:10.1016/j.iintel.2023.100061

Mozhgan Momtaz , Tianshu Li , Devin K. Harris , David Lattanzi

{"title":"基于多模态深度融合的桥梁状态评估","authors":"Mozhgan Momtaz , Tianshu Li , Devin K. Harris , David Lattanzi","doi":"10.1016/j.iintel.2023.100061","DOIUrl":null,"url":null,"abstract":"<div><p>Bridge condition rating is a challenging task as it largely depends on the experience-level of the manual inspection and therefore is prone to human errors. The inspection report often consists of a collection of images and sequences of sentences (text) explaining the condition of the considered bridge. In a routine manual bridge inspection, an inspector collects a set of images and textual descriptions of bridge components and assigns an overall condition rating (ranging between 0 and 9) based on the collected information. Unfortunately, this method of bridge inspection has been shown to yield inconsistent condition ratings that correlate with inspector experience. To improve the consistency among image-text inspection data and further predict the accordant condition ratings, this study first provides a collective image-text dataset, extracted from the collection of bridge inspection reports from the Virginia Department of Transportation. Using this dataset, we have developed novel deep learning-base methods for an automatic bridge condition rating prediction based on data fusion between the textual and visual data from the collected report sets.</p><p>Our proposed multi modal deep fusion approach constructs visual and textual representations for images and sentences separately using appropriate encoding functions, and then fuses representations of images and text to enhance the multi-modal prediction performance of the assigned condition ratings. Moreover, we study interpretations of the deployed deep models using saliency maps to identify parts of the image-text inputs that are essential in condition rating predictions. The findings of this study point to potential improvements by leveraging consistent image-text inspection data collection as well as leveraging the proposed deep fusion model to improve the bridge condition prediction rating from both visual and textual reports.</p></div>","PeriodicalId":100791,"journal":{"name":"Journal of Infrastructure Intelligence and Resilience","volume":"2 4","pages":"Article 100061"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-modal deep fusion for bridge condition assessment\",\"authors\":\"Mozhgan Momtaz , Tianshu Li , Devin K. Harris , David Lattanzi\",\"doi\":\"10.1016/j.iintel.2023.100061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Bridge condition rating is a challenging task as it largely depends on the experience-level of the manual inspection and therefore is prone to human errors. The inspection report often consists of a collection of images and sequences of sentences (text) explaining the condition of the considered bridge. In a routine manual bridge inspection, an inspector collects a set of images and textual descriptions of bridge components and assigns an overall condition rating (ranging between 0 and 9) based on the collected information. Unfortunately, this method of bridge inspection has been shown to yield inconsistent condition ratings that correlate with inspector experience. To improve the consistency among image-text inspection data and further predict the accordant condition ratings, this study first provides a collective image-text dataset, extracted from the collection of bridge inspection reports from the Virginia Department of Transportation. Using this dataset, we have developed novel deep learning-base methods for an automatic bridge condition rating prediction based on data fusion between the textual and visual data from the collected report sets.</p><p>Our proposed multi modal deep fusion approach constructs visual and textual representations for images and sentences separately using appropriate encoding functions, and then fuses representations of images and text to enhance the multi-modal prediction performance of the assigned condition ratings. Moreover, we study interpretations of the deployed deep models using saliency maps to identify parts of the image-text inputs that are essential in condition rating predictions. The findings of this study point to potential improvements by leveraging consistent image-text inspection data collection as well as leveraging the proposed deep fusion model to improve the bridge condition prediction rating from both visual and textual reports.</p></div>\",\"PeriodicalId\":100791,\"journal\":{\"name\":\"Journal of Infrastructure Intelligence and Resilience\",\"volume\":\"2 4\",\"pages\":\"Article 100061\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Infrastructure Intelligence and Resilience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772991523000361\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Infrastructure Intelligence and Resilience","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772991523000361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

桥梁状态评定是一项具有挑战性的任务，因为它在很大程度上取决于人工检查的经验水平，因此容易出现人为错误。检查报告通常由一组图像和一系列句子(文本)组成，说明所考虑的桥梁的状况。在常规的人工桥梁巡检中，检查员收集一组桥梁部件的图像和文字描述，并根据收集到的信息给出一个整体的状态等级(范围为0到9)。不幸的是，这种桥梁检查方法已被证明产生与检查员经验相关的不一致的状态评级。为了提高图像-文本检查数据之间的一致性，并进一步预测相应的状况评级，本研究首先提供了一个集体图像-文本数据集，该数据集提取自弗吉尼亚州交通部的桥梁检查报告集合。利用该数据集，我们开发了一种新颖的基于深度学习的方法，用于基于收集的报告集的文本数据和视觉数据之间的数据融合的桥梁状况自动预测。我们提出的多模态深度融合方法使用适当的编码函数分别构建图像和句子的视觉和文本表示，然后融合图像和文本的表示，以提高指定条件评级的多模态预测性能。此外，我们研究了使用显著性图对部署的深度模型的解释，以识别在状态评级预测中必不可少的图像-文本输入部分。这项研究的结果指出了利用一致的图像-文本检测数据收集以及利用所提出的深度融合模型来提高视觉和文本报告的桥梁状态预测评级的潜在改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-modal deep fusion for bridge condition assessment

Bridge condition rating is a challenging task as it largely depends on the experience-level of the manual inspection and therefore is prone to human errors. The inspection report often consists of a collection of images and sequences of sentences (text) explaining the condition of the considered bridge. In a routine manual bridge inspection, an inspector collects a set of images and textual descriptions of bridge components and assigns an overall condition rating (ranging between 0 and 9) based on the collected information. Unfortunately, this method of bridge inspection has been shown to yield inconsistent condition ratings that correlate with inspector experience. To improve the consistency among image-text inspection data and further predict the accordant condition ratings, this study first provides a collective image-text dataset, extracted from the collection of bridge inspection reports from the Virginia Department of Transportation. Using this dataset, we have developed novel deep learning-base methods for an automatic bridge condition rating prediction based on data fusion between the textual and visual data from the collected report sets.

Our proposed multi modal deep fusion approach constructs visual and textual representations for images and sentences separately using appropriate encoding functions, and then fuses representations of images and text to enhance the multi-modal prediction performance of the assigned condition ratings. Moreover, we study interpretations of the deployed deep models using saliency maps to identify parts of the image-text inputs that are essential in condition rating predictions. The findings of this study point to potential improvements by leveraging consistent image-text inspection data collection as well as leveraging the proposed deep fusion model to improve the bridge condition prediction rating from both visual and textual reports.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Infrastructure Intelligence and Resilience

CiteScore

2.10

自引率

0.00%

发文量