Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles

IF 5.1 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Hoang Ngoc Tran , Nam Nhat Ngo Nguyen , Nhi Quynh Phan Le , Thu Anh Ngoc Le , Anh Duy Nguyen
{"title":"Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles","authors":"Hoang Ngoc Tran ,&nbsp;Nam Nhat Ngo Nguyen ,&nbsp;Nhi Quynh Phan Le ,&nbsp;Thu Anh Ngoc Le ,&nbsp;Anh Duy Nguyen","doi":"10.1016/j.jestch.2025.102028","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO excels in linking object detection with contextual understanding, it faces challenges when detecting small or occluded signs. To address these limitations, we employ a lightweight LB-scSE (Linear Bottleneck Block with Simultaneous Spatial and Channel Squeeze &amp; Excitation) architecture, thereby improving detection accuracy while significantly reducing computational overhead.</div><div>We evaluate our framework on the custom DINO&amp;GTSRBv1 dataset, where the GroundingDINO Pro model achieves a mAP@50 of 68.52%. The self-distilled network further reduces model size by tenfold compared to baseline models (e.g., MobileNetV2, VGG16, ResNet18), yet maintains competitive accuracy, providing a robust, resource-efficient solution for real-time deployment. Our results indicate that integrating semantic grounding with distillation-based compression not only enhances traffic sign detection performance but also delivers a scalable and efficient approach for complex traffic environments. Additionally, our method outperforms standard architectures such as MobileNetV1-2, VGG16-19, and ResNet34-50, demonstrating higher detection accuracy and lower resource consumption, thus reinforcing its suitability for real-world autonomous driving scenarios.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"64 ","pages":"Article 102028"},"PeriodicalIF":5.1000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098625000837","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO excels in linking object detection with contextual understanding, it faces challenges when detecting small or occluded signs. To address these limitations, we employ a lightweight LB-scSE (Linear Bottleneck Block with Simultaneous Spatial and Channel Squeeze & Excitation) architecture, thereby improving detection accuracy while significantly reducing computational overhead.
We evaluate our framework on the custom DINO&GTSRBv1 dataset, where the GroundingDINO Pro model achieves a mAP@50 of 68.52%. The self-distilled network further reduces model size by tenfold compared to baseline models (e.g., MobileNetV2, VGG16, ResNet18), yet maintains competitive accuracy, providing a robust, resource-efficient solution for real-time deployment. Our results indicate that integrating semantic grounding with distillation-based compression not only enhances traffic sign detection performance but also delivers a scalable and efficient approach for complex traffic environments. Additionally, our method outperforms standard architectures such as MobileNetV1-2, VGG16-19, and ResNet34-50, demonstrating higher detection accuracy and lower resource consumption, thus reinforcing its suitability for real-world autonomous driving scenarios.
基于DINO和蒸馏增强模型的自动驾驶汽车高级交通标志检测和分类
准确的交通标志检测对于安全的自动驾驶至关重要。本文提出了一种将GroundingDINO(以其语义接地能力而著称)与自蒸馏ResNet相结合的新方法,以提高检测性能和实时可行性。虽然GroundingDINO在将目标检测与上下文理解联系起来方面表现出色,但在检测小或遮挡的标志时面临挑战。为了解决这些限制,我们采用了轻量级的LB-scSE(同时具有空间和通道挤压的线性瓶颈块)。激励)架构,从而提高检测精度,同时显着降低计算开销。我们在自定义dino & &;GTSRBv1数据集上评估我们的框架,其中GroundingDINO Pro模型达到了mAP@50的68.52%。与基线模型(例如,MobileNetV2, VGG16, ResNet18)相比,自提取网络进一步减少了模型尺寸的十倍,但保持了竞争力的准确性,为实时部署提供了强大的,资源高效的解决方案。我们的研究结果表明,将语义基础与基于蒸馏的压缩相结合不仅提高了交通标志检测性能,而且为复杂的交通环境提供了一种可扩展和高效的方法。此外,我们的方法优于MobileNetV1-2、VGG16-19和ResNet34-50等标准架构,具有更高的检测精度和更低的资源消耗,从而增强了其对现实世界自动驾驶场景的适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Engineering Science and Technology-An International Journal-Jestech
Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials
CiteScore
11.20
自引率
3.50%
发文量
153
审稿时长
22 days
期刊介绍: Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信