FCCText: frequency-color complementary bistream structure for scene text detection

IF 1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC
Ruiyi Han, Xin Li
{"title":"FCCText: frequency-color complementary bistream structure for scene text detection","authors":"Ruiyi Han, Xin Li","doi":"10.1117/1.jei.33.4.043037","DOIUrl":null,"url":null,"abstract":"Current scene text detection methods mainly employ RGB domain information for text localization, and their performance has not been fully exploited in many challenging scenes. Considering that the RGB features of text and background in complex environments are subtle and more discernible in the frequency domain, we consider that the frequency-domain information can effectively complement the RGB-domain features, collectively enhancing text detection capabilities. To this end, we propose a network with complementary frequency-domain semantic and color features, called the bistream structure, to facilitate text detection in scenes characterized by a wide variety of complex patterns. Our approach utilizes a frequency perception module (FPM) that converts features extracted by the backbone into the frequency domain to enhance the ability to distinguish the text from the complex background, thereby achieving coarse localization of texts. This innovation utilizes frequency-domain features to efficiently reveal text structures obscured by background noise in the RGB domain, resulting in a sharper differentiation between text and background elements in challenging scenarios. Moreover, we propose a complementary correction module that guides the fusion of multi-level RGB features through the coarse localization results, progressively refining the segmentation results to achieve the correction of the frequency domain features. Extensive experiments on the Total-Text, CTW1500, and MSRA-TD500 datasets demonstrate that our method achieves outstanding performance in scene text detection.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"30 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.4.043037","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Current scene text detection methods mainly employ RGB domain information for text localization, and their performance has not been fully exploited in many challenging scenes. Considering that the RGB features of text and background in complex environments are subtle and more discernible in the frequency domain, we consider that the frequency-domain information can effectively complement the RGB-domain features, collectively enhancing text detection capabilities. To this end, we propose a network with complementary frequency-domain semantic and color features, called the bistream structure, to facilitate text detection in scenes characterized by a wide variety of complex patterns. Our approach utilizes a frequency perception module (FPM) that converts features extracted by the backbone into the frequency domain to enhance the ability to distinguish the text from the complex background, thereby achieving coarse localization of texts. This innovation utilizes frequency-domain features to efficiently reveal text structures obscured by background noise in the RGB domain, resulting in a sharper differentiation between text and background elements in challenging scenarios. Moreover, we propose a complementary correction module that guides the fusion of multi-level RGB features through the coarse localization results, progressively refining the segmentation results to achieve the correction of the frequency domain features. Extensive experiments on the Total-Text, CTW1500, and MSRA-TD500 datasets demonstrate that our method achieves outstanding performance in scene text detection.
FCCText:用于场景文本检测的频色互补双稳态流结构
目前的场景文本检测方法主要采用 RGB 域信息进行文本定位,在许多具有挑战性的场景中,其性能尚未得到充分发挥。考虑到复杂环境中文字和背景的 RGB 特征比较微妙,在频域中更容易辨别,我们认为频域信息可以有效补充 RGB 域特征,共同提高文字检测能力。为此,我们提出了一种具有频域语义和颜色互补特征的网络,称为双流结构,以促进在具有各种复杂图案的场景中进行文本检测。我们的方法利用频率感知模块 (FPM),将骨干网提取的特征转换为频域特征,以增强从复杂背景中区分文本的能力,从而实现文本的粗略定位。这一创新利用频域特征来有效揭示被 RGB 域背景噪声掩盖的文本结构,从而在具有挑战性的场景中更清晰地区分文本和背景元素。此外,我们还提出了一个补充校正模块,通过粗略的定位结果引导多级 RGB 特征的融合,逐步完善分割结果,从而实现频域特征的校正。在 Total-Text、CTW1500 和 MSRA-TD500 数据集上进行的大量实验证明,我们的方法在场景文本检测中取得了出色的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Electronic Imaging
Journal of Electronic Imaging 工程技术-成像科学与照相技术
CiteScore
1.70
自引率
27.30%
发文量
341
审稿时长
4.0 months
期刊介绍: The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信