Lite transformer with medium self attention for efficient traffic sign recognition

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-06-14 DOI:10.1016/j.jvcir.2025.104502

Junbi Xiao, Qi Zhang, Wenjuan Gong, Jianhang Liu

{"title":"Lite transformer with medium self attention for efficient traffic sign recognition","authors":"Junbi Xiao, Qi Zhang, Wenjuan Gong, Jianhang Liu","doi":"10.1016/j.jvcir.2025.104502","DOIUrl":null,"url":null,"abstract":"<div><div>The accuracy of traffic sign recognition is of paramount importance for autonomous driving systems. This paper introduces the Indexing-and-Low-Rank-Medium Self Attention mechanism, an innovative approach to self-attention that has been developed with the objective of reducing the size of the model and the computational demand. This mechanism establishes macro-regional connections between queries and keys using indexing, combined with low-rank matrices in order to efficiently compute similarities, thereby reducing the computational overhead. To address the potential for feature loss from low-rank approximations, particularly in critical traffic sign details, we integrate a feature enhancement technique. This technique applies selective thresholding at the outset of the feature extraction process, emphasizing essential features while suppressing less significant ones, without significantly increasing the parameter count. This streamlined approach serves as the foundation for our lightweight model, IMSA-Net. Moreover, IMSA-Net achieves notable accuracies, with 81.7% on the ImageNet-1K dataset, representing a 3% improvement over MobileFormer. This is accompanied by a notable reduction in model parameters by 45.7% in comparison to MobileFormer. Furthermore, IMSA-Net surpasses models such as MobileFormer with accuracies of 93.75% on the German Traffic Sign Recognition Benchmark dataset and 92.97% on the Chinese Traffic Sign Database. This evidence substantiates the efficiency and effectiveness of IMSA-Net in traffic sign recognition tasks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104502"},"PeriodicalIF":3.1000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001166","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The accuracy of traffic sign recognition is of paramount importance for autonomous driving systems. This paper introduces the Indexing-and-Low-Rank-Medium Self Attention mechanism, an innovative approach to self-attention that has been developed with the objective of reducing the size of the model and the computational demand. This mechanism establishes macro-regional connections between queries and keys using indexing, combined with low-rank matrices in order to efficiently compute similarities, thereby reducing the computational overhead. To address the potential for feature loss from low-rank approximations, particularly in critical traffic sign details, we integrate a feature enhancement technique. This technique applies selective thresholding at the outset of the feature extraction process, emphasizing essential features while suppressing less significant ones, without significantly increasing the parameter count. This streamlined approach serves as the foundation for our lightweight model, IMSA-Net. Moreover, IMSA-Net achieves notable accuracies, with 81.7% on the ImageNet-1K dataset, representing a 3% improvement over MobileFormer. This is accompanied by a notable reduction in model parameters by 45.7% in comparison to MobileFormer. Furthermore, IMSA-Net surpasses models such as MobileFormer with accuracies of 93.75% on the German Traffic Sign Recognition Benchmark dataset and 92.97% on the Chinese Traffic Sign Database. This evidence substantiates the efficiency and effectiveness of IMSA-Net in traffic sign recognition tasks.

查看原文本刊更多论文

具有中等自关注的生活变压器，用于高效的交通标志识别

交通标志识别的准确性对自动驾驶系统至关重要。本文介绍了索引和低秩-中等自注意机制，这是一种创新的自注意方法，旨在减少模型的大小和计算需求。该机制使用索引在查询和键之间建立宏观区域连接，并结合低秩矩阵，以便有效地计算相似性，从而减少计算开销。为了解决低秩近似的潜在特征损失，特别是在关键的交通标志细节中，我们集成了一种特征增强技术。该技术在特征提取过程开始时应用选择性阈值，强调基本特征，同时抑制不太重要的特征，而不会显着增加参数计数。这种简化的方法是我们轻量级模型IMSA-Net的基础。此外，IMSA-Net在ImageNet-1K数据集上的准确率达到了81.7%，比MobileFormer提高了3%。与MobileFormer相比，模型参数显著降低了45.7%。此外，IMSA-Net在德国交通标志识别基准数据集上的准确率为93.75%，在中国交通标志数据库上的准确率为92.97%，超过了MobileFormer等模型。这一证据证实了IMSA-Net在交通标志识别任务中的效率和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.