{"title":"FCCText:用于场景文本检测的频色互补双稳态流结构","authors":"Ruiyi Han, Xin Li","doi":"10.1117/1.jei.33.4.043037","DOIUrl":null,"url":null,"abstract":"Current scene text detection methods mainly employ RGB domain information for text localization, and their performance has not been fully exploited in many challenging scenes. Considering that the RGB features of text and background in complex environments are subtle and more discernible in the frequency domain, we consider that the frequency-domain information can effectively complement the RGB-domain features, collectively enhancing text detection capabilities. To this end, we propose a network with complementary frequency-domain semantic and color features, called the bistream structure, to facilitate text detection in scenes characterized by a wide variety of complex patterns. Our approach utilizes a frequency perception module (FPM) that converts features extracted by the backbone into the frequency domain to enhance the ability to distinguish the text from the complex background, thereby achieving coarse localization of texts. This innovation utilizes frequency-domain features to efficiently reveal text structures obscured by background noise in the RGB domain, resulting in a sharper differentiation between text and background elements in challenging scenarios. Moreover, we propose a complementary correction module that guides the fusion of multi-level RGB features through the coarse localization results, progressively refining the segmentation results to achieve the correction of the frequency domain features. Extensive experiments on the Total-Text, CTW1500, and MSRA-TD500 datasets demonstrate that our method achieves outstanding performance in scene text detection.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"30 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FCCText: frequency-color complementary bistream structure for scene text detection\",\"authors\":\"Ruiyi Han, Xin Li\",\"doi\":\"10.1117/1.jei.33.4.043037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current scene text detection methods mainly employ RGB domain information for text localization, and their performance has not been fully exploited in many challenging scenes. Considering that the RGB features of text and background in complex environments are subtle and more discernible in the frequency domain, we consider that the frequency-domain information can effectively complement the RGB-domain features, collectively enhancing text detection capabilities. To this end, we propose a network with complementary frequency-domain semantic and color features, called the bistream structure, to facilitate text detection in scenes characterized by a wide variety of complex patterns. Our approach utilizes a frequency perception module (FPM) that converts features extracted by the backbone into the frequency domain to enhance the ability to distinguish the text from the complex background, thereby achieving coarse localization of texts. This innovation utilizes frequency-domain features to efficiently reveal text structures obscured by background noise in the RGB domain, resulting in a sharper differentiation between text and background elements in challenging scenarios. Moreover, we propose a complementary correction module that guides the fusion of multi-level RGB features through the coarse localization results, progressively refining the segmentation results to achieve the correction of the frequency domain features. Extensive experiments on the Total-Text, CTW1500, and MSRA-TD500 datasets demonstrate that our method achieves outstanding performance in scene text detection.\",\"PeriodicalId\":54843,\"journal\":{\"name\":\"Journal of Electronic Imaging\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronic Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1117/1.jei.33.4.043037\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.4.043037","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
FCCText: frequency-color complementary bistream structure for scene text detection
Current scene text detection methods mainly employ RGB domain information for text localization, and their performance has not been fully exploited in many challenging scenes. Considering that the RGB features of text and background in complex environments are subtle and more discernible in the frequency domain, we consider that the frequency-domain information can effectively complement the RGB-domain features, collectively enhancing text detection capabilities. To this end, we propose a network with complementary frequency-domain semantic and color features, called the bistream structure, to facilitate text detection in scenes characterized by a wide variety of complex patterns. Our approach utilizes a frequency perception module (FPM) that converts features extracted by the backbone into the frequency domain to enhance the ability to distinguish the text from the complex background, thereby achieving coarse localization of texts. This innovation utilizes frequency-domain features to efficiently reveal text structures obscured by background noise in the RGB domain, resulting in a sharper differentiation between text and background elements in challenging scenarios. Moreover, we propose a complementary correction module that guides the fusion of multi-level RGB features through the coarse localization results, progressively refining the segmentation results to achieve the correction of the frequency domain features. Extensive experiments on the Total-Text, CTW1500, and MSRA-TD500 datasets demonstrate that our method achieves outstanding performance in scene text detection.
期刊介绍:
The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.