基于时序集成的文字描述和场景文本识别

Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab
{"title":"基于时序集成的文字描述和场景文本识别","authors":"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab","doi":"10.1109/ICDAR.2017.65","DOIUrl":null,"url":null,"abstract":"Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"283 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Temporal Integration for Word-Wise Caption and Scene Text Identification\",\"authors\":\"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab\",\"doi\":\"10.1109/ICDAR.2017.65\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.\",\"PeriodicalId\":433676,\"journal\":{\"name\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"283 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2017.65\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

视频通常由编辑文本(即字幕文本)和自然文本(即场景文本)组成,这两种文本在性质和特征上都有所不同。字幕文本和场景文本的这种不同行为导致视频文本识别的准确率不高。在本文中,我们探讨了小波分解和时间相干在标题和场景文本分类中的应用。我们提出了高频子带的小波来分离输入词中由高频系数表示的候选文本。本文提出的方法基于文本候选者的标准偏差在第一区域高,中间区域低,第三区域高的事实,研究文本候选者在文字图像上的分布。这是通过将标准偏差值映射到基于标准偏差值范围形成的8个大小相等的箱子来提取的。在小波的第一级和第二级箱之间的相关性进行了探讨,以区分标题和场景文本,并确定时间帧的数量进行分析。用选取的时间帧对标题和场景文本的属性进行验证,找到稳定的属性进行分类。在ICDAR 2015、YVT和车牌视频三个标准数据集上的实验结果表明,本文方法在分类率上优于现有方法,基于分类结果显著提高了识别率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Temporal Integration for Word-Wise Caption and Scene Text Identification
Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信