Traffic Scene Captioning with Multi-Stage Feature Enhancement

IF 1.7

Computers, materials & continua Pub Date : 2023-01-01 DOI:10.32604/cmc.2023.038264

Dehai Zhang, Yu Ma, Qing Liu, Haoxing Wang, Anquan Ren, Jiashu Liang

{"title":"Traffic Scene Captioning with Multi-Stage Feature Enhancement","authors":"Dehai Zhang, Yu Ma, Qing Liu, Haoxing Wang, Anquan Ren, Jiashu Liang","doi":"10.32604/cmc.2023.038264","DOIUrl":null,"url":null,"abstract":"Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images, ensuring road safety while providing an important decision-making function for sustainable transportation. In order to provide a comprehensive and reasonable description of complex traffic scenes, a traffic scene semantic captioning model with multi-stage feature enhancement is proposed in this paper. In general, the model follows an encoder-decoder structure. First, multi-level granularity visual features are used for feature enhancement during the encoding process, which enables the model to learn more detailed content in the traffic scene image. Second, the scene knowledge graph is applied to the decoding process, and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again, so that the model can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions. This paper reports extensive experiments on the challenging MS-COCO dataset, evaluated by five standard automatic evaluation metrics, and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods, especially achieving a score of 129.0 on the CIDEr-D evaluation metric, which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.","PeriodicalId":93535,"journal":{"name":"Computers, materials & continua","volume":"20 1","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers, materials & continua","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32604/cmc.2023.038264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images, ensuring road safety while providing an important decision-making function for sustainable transportation. In order to provide a comprehensive and reasonable description of complex traffic scenes, a traffic scene semantic captioning model with multi-stage feature enhancement is proposed in this paper. In general, the model follows an encoder-decoder structure. First, multi-level granularity visual features are used for feature enhancement during the encoding process, which enables the model to learn more detailed content in the traffic scene image. Second, the scene knowledge graph is applied to the decoding process, and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again, so that the model can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions. This paper reports extensive experiments on the challenging MS-COCO dataset, evaluated by five standard automatic evaluation metrics, and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods, especially achieving a score of 129.0 on the CIDEr-D evaluation metric, which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.

查看原文本刊更多论文

多阶段特征增强的交通场景字幕

交通场景字幕技术通过分析输入的交通场景图像的内容，自动生成一个或多个描述交通场景内容的句子，在保证道路安全的同时，为可持续交通提供重要的决策功能。为了对复杂的交通场景进行全面、合理的描述，提出了一种多阶段特征增强的交通场景语义字幕模型。一般来说，该模型遵循编码器-解码器结构。首先，在编码过程中使用多层次粒度的视觉特征进行特征增强，使模型能够学习到交通场景图像中更详细的内容。其次，将场景知识图应用到解码过程中，利用场景知识图提供的语义特征对解码器学习到的特征进行再次增强，使模型能够学习交通场景中物体的属性以及物体之间的关系，从而生成更合理的字幕。本文在具有挑战性的MS-COCO数据集上进行了大量实验，用5个标准自动评价指标对模型进行了评价，结果表明，与现有方法相比，本文提出的模型在所有指标上都有显著提高，特别是在CIDEr-D评价指标上达到了129.0分，这也表明本文提出的模型可以有效地提供更合理、更全面的交通场景描述。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers, materials & continua

自引率

0.00%

发文量