视频帧中场景和字幕文本分类的新篡改特征

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR) Pub Date : 2016-10-01 DOI:10.1109/ICFHR.2016.0020

Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, C. Tan

{"title":"视频帧中场景和字幕文本分类的新篡改特征","authors":"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, C. Tan","doi":"10.1109/ICFHR.2016.0020","DOIUrl":null,"url":null,"abstract":"The presence of both caption/graphics/superimposed and scene texts in video frames is the major cause for the poor accuracy of text recognition methods. This paper proposes an approach for identifying tampered information by analyzing the spatial distribution of DCT coefficients in a new way for classifying caption and scene text. Since caption text is edited/superimposed, which results in artificially created texts comparing to scene texts that exist naturally in frames. We exploit this fact to identify the presence of caption and scene texts in video frames based on the advantage of DCT coefficients. The proposed method analyzes the distributions of both zero and non-zero coefficients (only positive values) locally by moving a window, and studies histogram operations over each input text line image. This generates line graphs for respective zero and non-zero coefficient coordinates. We further study the behavior of text lines, namely, linearity and smoothness based on centroid location analysis, and the principal axis direction of each text line for classification. Experimental results on standard datasets, namely, ICDAR 2013 video, 2015 video, YVT video and our own data, show that the performances of text recognition methods are improved significantly after-classification compared to before-classification.","PeriodicalId":194844,"journal":{"name":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"New Tampered Features for Scene and Caption Text Classification in Video Frame\",\"authors\":\"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, C. Tan\",\"doi\":\"10.1109/ICFHR.2016.0020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The presence of both caption/graphics/superimposed and scene texts in video frames is the major cause for the poor accuracy of text recognition methods. This paper proposes an approach for identifying tampered information by analyzing the spatial distribution of DCT coefficients in a new way for classifying caption and scene text. Since caption text is edited/superimposed, which results in artificially created texts comparing to scene texts that exist naturally in frames. We exploit this fact to identify the presence of caption and scene texts in video frames based on the advantage of DCT coefficients. The proposed method analyzes the distributions of both zero and non-zero coefficients (only positive values) locally by moving a window, and studies histogram operations over each input text line image. This generates line graphs for respective zero and non-zero coefficient coordinates. We further study the behavior of text lines, namely, linearity and smoothness based on centroid location analysis, and the principal axis direction of each text line for classification. Experimental results on standard datasets, namely, ICDAR 2013 video, 2015 video, YVT video and our own data, show that the performances of text recognition methods are improved significantly after-classification compared to before-classification.\",\"PeriodicalId\":194844,\"journal\":{\"name\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFHR.2016.0020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2016.0020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

视频帧中同时存在字幕/图形/叠加文本和场景文本是导致文本识别方法准确性差的主要原因。本文提出了一种通过分析DCT系数的空间分布来识别篡改信息的方法，为标题和场景文本分类提供了一种新的方法。由于标题文本是编辑/叠加的，这导致人工创建的文本与自然存在于帧中的场景文本相比。基于DCT系数的优势，我们利用这一事实来识别视频帧中标题和场景文本的存在。该方法通过移动窗口局部分析零系数和非零系数(仅为正值)的分布，并研究对每个输入文本行图像的直方图操作。这将为各自的零系数和非零系数坐标生成线形图。我们进一步研究文本线的行为，即基于质心位置分析的线性度和平滑度，以及每条文本线的主轴方向进行分类。在标准数据集，即ICDAR 2013视频、2015视频、YVT视频和我们自己的数据上的实验结果表明，分类后的文本识别方法的性能比分类前有了明显的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

New Tampered Features for Scene and Caption Text Classification in Video Frame

The presence of both caption/graphics/superimposed and scene texts in video frames is the major cause for the poor accuracy of text recognition methods. This paper proposes an approach for identifying tampered information by analyzing the spatial distribution of DCT coefficients in a new way for classifying caption and scene text. Since caption text is edited/superimposed, which results in artificially created texts comparing to scene texts that exist naturally in frames. We exploit this fact to identify the presence of caption and scene texts in video frames based on the advantage of DCT coefficients. The proposed method analyzes the distributions of both zero and non-zero coefficients (only positive values) locally by moving a window, and studies histogram operations over each input text line image. This generates line graphs for respective zero and non-zero coefficient coordinates. We further study the behavior of text lines, namely, linearity and smoothness based on centroid location analysis, and the principal axis direction of each text line for classification. Experimental results on standard datasets, namely, ICDAR 2013 video, 2015 video, YVT video and our own data, show that the performances of text recognition methods are improved significantly after-classification compared to before-classification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)

自引率

0.00%

发文量