{"title":"Lite transformer with medium self attention for efficient traffic sign recognition","authors":"Junbi Xiao, Qi Zhang, Wenjuan Gong, Jianhang Liu","doi":"10.1016/j.jvcir.2025.104502","DOIUrl":null,"url":null,"abstract":"<div><div>The accuracy of traffic sign recognition is of paramount importance for autonomous driving systems. This paper introduces the Indexing-and-Low-Rank-Medium Self Attention mechanism, an innovative approach to self-attention that has been developed with the objective of reducing the size of the model and the computational demand. This mechanism establishes macro-regional connections between queries and keys using indexing, combined with low-rank matrices in order to efficiently compute similarities, thereby reducing the computational overhead. To address the potential for feature loss from low-rank approximations, particularly in critical traffic sign details, we integrate a feature enhancement technique. This technique applies selective thresholding at the outset of the feature extraction process, emphasizing essential features while suppressing less significant ones, without significantly increasing the parameter count. This streamlined approach serves as the foundation for our lightweight model, IMSA-Net. Moreover, IMSA-Net achieves notable accuracies, with 81.7% on the ImageNet-1K dataset, representing a 3% improvement over MobileFormer. This is accompanied by a notable reduction in model parameters by 45.7% in comparison to MobileFormer. Furthermore, IMSA-Net surpasses models such as MobileFormer with accuracies of 93.75% on the German Traffic Sign Recognition Benchmark dataset and 92.97% on the Chinese Traffic Sign Database. This evidence substantiates the efficiency and effectiveness of IMSA-Net in traffic sign recognition tasks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104502"},"PeriodicalIF":3.1000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001166","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The accuracy of traffic sign recognition is of paramount importance for autonomous driving systems. This paper introduces the Indexing-and-Low-Rank-Medium Self Attention mechanism, an innovative approach to self-attention that has been developed with the objective of reducing the size of the model and the computational demand. This mechanism establishes macro-regional connections between queries and keys using indexing, combined with low-rank matrices in order to efficiently compute similarities, thereby reducing the computational overhead. To address the potential for feature loss from low-rank approximations, particularly in critical traffic sign details, we integrate a feature enhancement technique. This technique applies selective thresholding at the outset of the feature extraction process, emphasizing essential features while suppressing less significant ones, without significantly increasing the parameter count. This streamlined approach serves as the foundation for our lightweight model, IMSA-Net. Moreover, IMSA-Net achieves notable accuracies, with 81.7% on the ImageNet-1K dataset, representing a 3% improvement over MobileFormer. This is accompanied by a notable reduction in model parameters by 45.7% in comparison to MobileFormer. Furthermore, IMSA-Net surpasses models such as MobileFormer with accuracies of 93.75% on the German Traffic Sign Recognition Benchmark dataset and 92.97% on the Chinese Traffic Sign Database. This evidence substantiates the efficiency and effectiveness of IMSA-Net in traffic sign recognition tasks.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.