CTAFFNet: CNN–Transformer Adaptive Feature Fusion Object Detection Algorithm for Complex Traffic Scenarios

Transportation Research Record: Journal of the Transportation Research Board Pub Date : 2024-09-17 DOI:10.1177/03611981241258753

Xinlong Dong, Peicheng Shi, Taonian Liang, Aixi Yang

{"title":"CTAFFNet: CNN–Transformer Adaptive Feature Fusion Object Detection Algorithm for Complex Traffic Scenarios","authors":"Xinlong Dong, Peicheng Shi, Taonian Liang, Aixi Yang","doi":"10.1177/03611981241258753","DOIUrl":null,"url":null,"abstract":"As the core technology of an environmental perception system, object detection has received more and more attention and has become a hot research direction for intelligent driving vehicles. The CNN–Transformer hybrid model has poor generalization ability, making it difficult to meet the detection requirements for small objects in complex scenes. We propose a novel convolutional neural network (CNN)–Transformer Adaptive Feature Fusion Network (CTAFFNet) for object detection. First, we design a Local–Global Feature Fusion unit known as the Convolutional Transformation Adaptive Fusion Kernel (CTAFFK), which is integrated into CTAFFNet. The CTAFFK kernel utilizes two branches, namely CNN and Transformer, to extract local and global features from the image, and adaptively fuses the features from both branches. Subsequently, we develop an adaptive feature fusion strategy that combines local high-frequency and global low-frequency features to obtain comprehensive feature information. Finally, CTAFFNet employs an encoder–decoder structure to facilitate the flow of fused local–global information between different stages, ensuring the model’s generalization capabilities. Results from the experiment conducted on the large and challenging KITTI dataset demonstrate the effectiveness and efficiency of the proposed network. Compared with other mainstream networks, it achieves an average precision of 91.17%, particularly excelling in the detection of small objects at longer distances with a remarkable 70.18% accuracy, while also providing real-time detection capabilities.","PeriodicalId":517391,"journal":{"name":"Transportation Research Record: Journal of the Transportation Research Board","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Record: Journal of the Transportation Research Board","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/03611981241258753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

As the core technology of an environmental perception system, object detection has received more and more attention and has become a hot research direction for intelligent driving vehicles. The CNN–Transformer hybrid model has poor generalization ability, making it difficult to meet the detection requirements for small objects in complex scenes. We propose a novel convolutional neural network (CNN)–Transformer Adaptive Feature Fusion Network (CTAFFNet) for object detection. First, we design a Local–Global Feature Fusion unit known as the Convolutional Transformation Adaptive Fusion Kernel (CTAFFK), which is integrated into CTAFFNet. The CTAFFK kernel utilizes two branches, namely CNN and Transformer, to extract local and global features from the image, and adaptively fuses the features from both branches. Subsequently, we develop an adaptive feature fusion strategy that combines local high-frequency and global low-frequency features to obtain comprehensive feature information. Finally, CTAFFNet employs an encoder–decoder structure to facilitate the flow of fused local–global information between different stages, ensuring the model’s generalization capabilities. Results from the experiment conducted on the large and challenging KITTI dataset demonstrate the effectiveness and efficiency of the proposed network. Compared with other mainstream networks, it achieves an average precision of 91.17%, particularly excelling in the detection of small objects at longer distances with a remarkable 70.18% accuracy, while also providing real-time detection capabilities.

查看原文本刊更多论文

CTAFFNet：适用于复杂交通场景的 CNN-变换器自适应特征融合物体检测算法

作为环境感知系统的核心技术，物体检测受到越来越多的关注，已成为智能驾驶汽车的热点研究方向。卷积神经网络（CNN）-变换器混合模型的泛化能力较差，难以满足复杂场景中小物体的检测要求。我们提出了一种用于物体检测的新型卷积神经网络（CNN）-变换器自适应特征融合网络（CTAFFNet）。首先，我们设计了一个名为卷积变换自适应融合内核（CTAFFK）的局部-全局特征融合单元，并将其集成到 CTAFFNet 中。CTAFFK 内核利用两个分支，即 CNN 和变换器，从图像中提取局部和全局特征，并自适应地融合来自两个分支的特征。随后，我们开发了一种自适应特征融合策略，将局部高频特征和全局低频特征相结合，从而获得全面的特征信息。最后，CTAFFNet 采用了编码器-解码器结构，以促进融合后的局部-全局信息在不同阶段之间的流动，从而确保模型的泛化能力。在大型且具有挑战性的 KITTI 数据集上进行的实验结果证明了所提出网络的有效性和效率。与其他主流网络相比，它的平均精度达到了 91.17%，特别是在检测较远距离的小型物体方面表现出色，准确率高达 70.18%，同时还具有实时检测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transportation Research Record: Journal of the Transportation Research Board

自引率

0.00%

发文量