跨域多步骤思考:零采样细粒度交通标志识别

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yaozong Gan , Guang Li , Ren Togo , Keisuke Maeda , Takahiro Ogawa , Miki Haseyama
{"title":"跨域多步骤思考:零采样细粒度交通标志识别","authors":"Yaozong Gan ,&nbsp;Guang Li ,&nbsp;Ren Togo ,&nbsp;Keisuke Maeda ,&nbsp;Takahiro Ogawa ,&nbsp;Miki Haseyama","doi":"10.1016/j.knosys.2025.114172","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we propose <strong>C</strong>ross-<strong>d</strong>omain <strong>M</strong>ulti-step <strong>T</strong>hinking (<strong>CdMT</strong>) to improve zero-shot fine-grained traffic sign recognition (TSR) performance in the wild. Zero-shot fine-grained TSR in the wild is challenging due to the cross-domain problem between clean template traffic signs and real-world counterparts, and existing approaches particularly struggle with cross-country TSR scenarios, where traffic signs typically differ between countries. The proposed CdMT framework tackles these challenges by leveraging the multi-step reasoning capabilities of large multimodal models (LMMs). We introduce context, characteristic, and differential descriptions to design multiple thinking processes for LMMs. Context descriptions, which are enhanced by center coordinate prompt optimization, enable the precise localization of target traffic signs in complex road images and filter irrelevant responses via novel prior traffic sign hypotheses. Characteristic descriptions, which are derived from in-context learning with template traffic signs, bridge cross-domain gaps and enhance fine-grained TSR. Differential descriptions refine the multimodal reasoning ability of LMMs by distinguishing subtle differences among similar signs. CdMT is independent of training data and requires only simple and uniform instructions, enabling it to achieve cross-country TSR. We conducted extensive experiments on three benchmark datasets and two real-world datasets from different countries. The proposed CdMT framework achieved superior performance compared with other state-of-the-art methods on all five datasets, with recognition accuracies of 0.93, 0.89, 0.97, 0.89, and 0.85 on the GTSRB, BTSD, TT-100K, Sapporo, and Yokohama datasets, respectively.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114172"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-domain Multi-step Thinking: Zero-shot Fine-grained Traffic Sign Recognition in the Wild\",\"authors\":\"Yaozong Gan ,&nbsp;Guang Li ,&nbsp;Ren Togo ,&nbsp;Keisuke Maeda ,&nbsp;Takahiro Ogawa ,&nbsp;Miki Haseyama\",\"doi\":\"10.1016/j.knosys.2025.114172\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this study, we propose <strong>C</strong>ross-<strong>d</strong>omain <strong>M</strong>ulti-step <strong>T</strong>hinking (<strong>CdMT</strong>) to improve zero-shot fine-grained traffic sign recognition (TSR) performance in the wild. Zero-shot fine-grained TSR in the wild is challenging due to the cross-domain problem between clean template traffic signs and real-world counterparts, and existing approaches particularly struggle with cross-country TSR scenarios, where traffic signs typically differ between countries. The proposed CdMT framework tackles these challenges by leveraging the multi-step reasoning capabilities of large multimodal models (LMMs). We introduce context, characteristic, and differential descriptions to design multiple thinking processes for LMMs. Context descriptions, which are enhanced by center coordinate prompt optimization, enable the precise localization of target traffic signs in complex road images and filter irrelevant responses via novel prior traffic sign hypotheses. Characteristic descriptions, which are derived from in-context learning with template traffic signs, bridge cross-domain gaps and enhance fine-grained TSR. Differential descriptions refine the multimodal reasoning ability of LMMs by distinguishing subtle differences among similar signs. CdMT is independent of training data and requires only simple and uniform instructions, enabling it to achieve cross-country TSR. We conducted extensive experiments on three benchmark datasets and two real-world datasets from different countries. The proposed CdMT framework achieved superior performance compared with other state-of-the-art methods on all five datasets, with recognition accuracies of 0.93, 0.89, 0.97, 0.89, and 0.85 on the GTSRB, BTSD, TT-100K, Sapporo, and Yokohama datasets, respectively.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"327 \",\"pages\":\"Article 114172\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125012134\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125012134","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

在这项研究中,我们提出了跨域多步骤思维(Cross-domain Multi-step Thinking, CdMT)来提高零射击细粒度交通标志识别(TSR)的性能。由于干净的模板交通标志与真实世界的交通标志之间存在跨域问题,因此零采样细粒度TSR在野外是具有挑战性的,而现有的方法尤其难以应对跨国TSR场景,因为各国的交通标志通常不同。提出的CdMT框架通过利用大型多模态模型(lmm)的多步推理能力来解决这些挑战。我们引入上下文、特征和差异描述来设计lmm的多重思维过程。通过中心坐标提示优化增强的上下文描述,可以在复杂的道路图像中精确定位目标交通标志,并通过新的先验交通标志假设过滤不相关的响应。基于模板交通标志的上下文学习的特征描述可以弥补跨领域的差距,增强细粒度的TSR。差分描述通过区分相似符号之间的细微差异,改进了lmm的多模态推理能力。CdMT不依赖于训练数据,只需要简单统一的指令,就可以实现跨国TSR。我们在三个基准数据集和两个来自不同国家的真实数据集上进行了广泛的实验。与其他最先进的方法相比,所提出的CdMT框架在所有五个数据集上都取得了更好的性能,在GTSRB、BTSD、TT-100K、Sapporo和Yokohama数据集上的识别准确率分别为0.93、0.89、0.97、0.89和0.85。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cross-domain Multi-step Thinking: Zero-shot Fine-grained Traffic Sign Recognition in the Wild
In this study, we propose Cross-domain Multi-step Thinking (CdMT) to improve zero-shot fine-grained traffic sign recognition (TSR) performance in the wild. Zero-shot fine-grained TSR in the wild is challenging due to the cross-domain problem between clean template traffic signs and real-world counterparts, and existing approaches particularly struggle with cross-country TSR scenarios, where traffic signs typically differ between countries. The proposed CdMT framework tackles these challenges by leveraging the multi-step reasoning capabilities of large multimodal models (LMMs). We introduce context, characteristic, and differential descriptions to design multiple thinking processes for LMMs. Context descriptions, which are enhanced by center coordinate prompt optimization, enable the precise localization of target traffic signs in complex road images and filter irrelevant responses via novel prior traffic sign hypotheses. Characteristic descriptions, which are derived from in-context learning with template traffic signs, bridge cross-domain gaps and enhance fine-grained TSR. Differential descriptions refine the multimodal reasoning ability of LMMs by distinguishing subtle differences among similar signs. CdMT is independent of training data and requires only simple and uniform instructions, enabling it to achieve cross-country TSR. We conducted extensive experiments on three benchmark datasets and two real-world datasets from different countries. The proposed CdMT framework achieved superior performance compared with other state-of-the-art methods on all five datasets, with recognition accuracies of 0.93, 0.89, 0.97, 0.89, and 0.85 on the GTSRB, BTSD, TT-100K, Sapporo, and Yokohama datasets, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信