基于时序集成的文字描述和场景文本识别

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI:10.1109/ICDAR.2017.65

Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab

{"title":"基于时序集成的文字描述和场景文本识别","authors":"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab","doi":"10.1109/ICDAR.2017.65","DOIUrl":null,"url":null,"abstract":"Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"283 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Temporal Integration for Word-Wise Caption and Scene Text Identification\",\"authors\":\"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab\",\"doi\":\"10.1109/ICDAR.2017.65\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.\",\"PeriodicalId\":433676,\"journal\":{\"name\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"283 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2017.65\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

视频通常由编辑文本(即字幕文本)和自然文本(即场景文本)组成，这两种文本在性质和特征上都有所不同。字幕文本和场景文本的这种不同行为导致视频文本识别的准确率不高。在本文中，我们探讨了小波分解和时间相干在标题和场景文本分类中的应用。我们提出了高频子带的小波来分离输入词中由高频系数表示的候选文本。本文提出的方法基于文本候选者的标准偏差在第一区域高，中间区域低，第三区域高的事实，研究文本候选者在文字图像上的分布。这是通过将标准偏差值映射到基于标准偏差值范围形成的8个大小相等的箱子来提取的。在小波的第一级和第二级箱之间的相关性进行了探讨，以区分标题和场景文本，并确定时间帧的数量进行分析。用选取的时间帧对标题和场景文本的属性进行验证，找到稳定的属性进行分类。在ICDAR 2015、YVT和车牌视频三个标准数据集上的实验结果表明，本文方法在分类率上优于现有方法，基于分类结果显著提高了识别率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Temporal Integration for Word-Wise Caption and Scene Text Identification

Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量