An algorithm for multi-directional text detection in natural scenes

IF 2.9 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Digital Signal Processing Pub Date : 2025-07-17 DOI:10.1016/j.dsp.2025.105482

Dapeng Wan , Lixia Deng , Jinshun Dong , Meiqi Guo , Jianqin Yin , Chenxu Liu , Haiying Liu

{"title":"An algorithm for multi-directional text detection in natural scenes","authors":"Dapeng Wan , Lixia Deng , Jinshun Dong , Meiqi Guo , Jianqin Yin , Chenxu Liu , Haiying Liu","doi":"10.1016/j.dsp.2025.105482","DOIUrl":null,"url":null,"abstract":"<div><div>Due to factors such as background interference and scale variations, the text detection task in natural scenes is faced with challenges, especially in applications like autonomous driving and image understanding, where higher requirements are imposed on detection accuracy and efficiency. Under this background, the development of efficient detection algorithms specifically for natural-scene text is of particular importance. To this end, the Multi-Directional Text You Only Look Once (MDT-YOLO) is proposed in this paper. Firstly, a Dual Path Residual Connection (DPRC) block is designed, which enhances the model's multi-scale feature perception ability and alleviates the problem of missed detections caused by scale variations. Secondly, to reduce text information loss during downsampling, the Depthwise Separable Strided Downsampling (DSSDown) module is proposed, improving the model's ability to recognize fine - grained text regions. Additionally, an Efficient Down-Transition (EDT) module is constructed to reconstruct the Backbone network, achieving a coordinated improvement in semantic modeling and computational efficiency. Experimental results show that compared with the baseline model, the parameter count of MDT-YOLO is reduced by 29.4% while the processing speed remains basically the same. Meanwhile, on the MSRA-TD500 dataset, Precision and [email protected] are improved by 3.2% and 2.8% respectively, and on the HUST-TR400 dataset, they are improved by 1.7% and 1.3% respectively. The code will be available at <span><span>https://github.com/WDP-0806/MDT-YOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105482"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425005044","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Due to factors such as background interference and scale variations, the text detection task in natural scenes is faced with challenges, especially in applications like autonomous driving and image understanding, where higher requirements are imposed on detection accuracy and efficiency. Under this background, the development of efficient detection algorithms specifically for natural-scene text is of particular importance. To this end, the Multi-Directional Text You Only Look Once (MDT-YOLO) is proposed in this paper. Firstly, a Dual Path Residual Connection (DPRC) block is designed, which enhances the model's multi-scale feature perception ability and alleviates the problem of missed detections caused by scale variations. Secondly, to reduce text information loss during downsampling, the Depthwise Separable Strided Downsampling (DSSDown) module is proposed, improving the model's ability to recognize fine - grained text regions. Additionally, an Efficient Down-Transition (EDT) module is constructed to reconstruct the Backbone network, achieving a coordinated improvement in semantic modeling and computational efficiency. Experimental results show that compared with the baseline model, the parameter count of MDT-YOLO is reduced by 29.4% while the processing speed remains basically the same. Meanwhile, on the MSRA-TD500 dataset, Precision and [email protected] are improved by 3.2% and 2.8% respectively, and on the HUST-TR400 dataset, they are improved by 1.7% and 1.3% respectively. The code will be available at https://github.com/WDP-0806/MDT-YOLO.

查看原文本刊更多论文

自然场景中多向文本检测算法

由于背景干扰和尺度变化等因素，自然场景下的文本检测任务面临着挑战，特别是在自动驾驶和图像理解等应用中，对检测精度和效率提出了更高的要求。在此背景下，开发针对自然场景文本的高效检测算法显得尤为重要。为此，本文提出了Multi-Directional Text You Only Look Once （MDT-YOLO）。首先，设计了双路径残差连接（Dual Path Residual Connection， DPRC）块，增强了模型的多尺度特征感知能力，缓解了尺度变化带来的漏检问题；其次，为了减少下采样过程中文本信息的丢失，提出了深度可分跨步下采样（DSSDown）模块，提高了模型对细粒度文本区域的识别能力。在此基础上，构建了高效下过渡（Efficient Down-Transition， EDT）模块对骨干网进行重构，实现了语义建模和计算效率的协同提升。实验结果表明，与基线模型相比，MDT-YOLO的参数个数减少了29.4%，而处理速度基本保持不变。同时，在MSRA-TD500数据集上，Precision和[email protected]分别提高了3.2%和2.8%，在HUST-TR400数据集上，它们分别提高了1.7%和1.3%。代码可在https://github.com/WDP-0806/MDT-YOLO上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

17.20%

发文量

435

审稿时长

66 days

期刊介绍： Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,