FCCText: frequency-color complementary bistream structure for scene text detection

IF 1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging Pub Date : 2024-08-01 DOI:10.1117/1.jei.33.4.043037

Ruiyi Han, Xin Li

{"title":"FCCText: frequency-color complementary bistream structure for scene text detection","authors":"Ruiyi Han, Xin Li","doi":"10.1117/1.jei.33.4.043037","DOIUrl":null,"url":null,"abstract":"Current scene text detection methods mainly employ RGB domain information for text localization, and their performance has not been fully exploited in many challenging scenes. Considering that the RGB features of text and background in complex environments are subtle and more discernible in the frequency domain, we consider that the frequency-domain information can effectively complement the RGB-domain features, collectively enhancing text detection capabilities. To this end, we propose a network with complementary frequency-domain semantic and color features, called the bistream structure, to facilitate text detection in scenes characterized by a wide variety of complex patterns. Our approach utilizes a frequency perception module (FPM) that converts features extracted by the backbone into the frequency domain to enhance the ability to distinguish the text from the complex background, thereby achieving coarse localization of texts. This innovation utilizes frequency-domain features to efficiently reveal text structures obscured by background noise in the RGB domain, resulting in a sharper differentiation between text and background elements in challenging scenarios. Moreover, we propose a complementary correction module that guides the fusion of multi-level RGB features through the coarse localization results, progressively refining the segmentation results to achieve the correction of the frequency domain features. Extensive experiments on the Total-Text, CTW1500, and MSRA-TD500 datasets demonstrate that our method achieves outstanding performance in scene text detection.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"30 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.4.043037","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Current scene text detection methods mainly employ RGB domain information for text localization, and their performance has not been fully exploited in many challenging scenes. Considering that the RGB features of text and background in complex environments are subtle and more discernible in the frequency domain, we consider that the frequency-domain information can effectively complement the RGB-domain features, collectively enhancing text detection capabilities. To this end, we propose a network with complementary frequency-domain semantic and color features, called the bistream structure, to facilitate text detection in scenes characterized by a wide variety of complex patterns. Our approach utilizes a frequency perception module (FPM) that converts features extracted by the backbone into the frequency domain to enhance the ability to distinguish the text from the complex background, thereby achieving coarse localization of texts. This innovation utilizes frequency-domain features to efficiently reveal text structures obscured by background noise in the RGB domain, resulting in a sharper differentiation between text and background elements in challenging scenarios. Moreover, we propose a complementary correction module that guides the fusion of multi-level RGB features through the coarse localization results, progressively refining the segmentation results to achieve the correction of the frequency domain features. Extensive experiments on the Total-Text, CTW1500, and MSRA-TD500 datasets demonstrate that our method achieves outstanding performance in scene text detection.

查看原文本刊更多论文

FCCText：用于场景文本检测的频色互补双稳态流结构

目前的场景文本检测方法主要采用 RGB 域信息进行文本定位，在许多具有挑战性的场景中，其性能尚未得到充分发挥。考虑到复杂环境中文字和背景的 RGB 特征比较微妙，在频域中更容易辨别，我们认为频域信息可以有效补充 RGB 域特征，共同提高文字检测能力。为此，我们提出了一种具有频域语义和颜色互补特征的网络，称为双流结构，以促进在具有各种复杂图案的场景中进行文本检测。我们的方法利用频率感知模块 (FPM)，将骨干网提取的特征转换为频域特征，以增强从复杂背景中区分文本的能力，从而实现文本的粗略定位。这一创新利用频域特征来有效揭示被 RGB 域背景噪声掩盖的文本结构，从而在具有挑战性的场景中更清晰地区分文本和背景元素。此外，我们还提出了一个补充校正模块，通过粗略的定位结果引导多级 RGB 特征的融合，逐步完善分割结果，从而实现频域特征的校正。在 Total-Text、CTW1500 和 MSRA-TD500 数据集上进行的大量实验证明，我们的方法在场景文本检测中取得了出色的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Electronic Imaging 工程技术-成像科学与照相技术

CiteScore

1.70

自引率

27.30%

发文量

341

审稿时长

4.0 months

期刊介绍： The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.