视频帧中场景和字幕文本分类的新篡改特征

Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, C. Tan
{"title":"视频帧中场景和字幕文本分类的新篡改特征","authors":"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, C. Tan","doi":"10.1109/ICFHR.2016.0020","DOIUrl":null,"url":null,"abstract":"The presence of both caption/graphics/superimposed and scene texts in video frames is the major cause for the poor accuracy of text recognition methods. This paper proposes an approach for identifying tampered information by analyzing the spatial distribution of DCT coefficients in a new way for classifying caption and scene text. Since caption text is edited/superimposed, which results in artificially created texts comparing to scene texts that exist naturally in frames. We exploit this fact to identify the presence of caption and scene texts in video frames based on the advantage of DCT coefficients. The proposed method analyzes the distributions of both zero and non-zero coefficients (only positive values) locally by moving a window, and studies histogram operations over each input text line image. This generates line graphs for respective zero and non-zero coefficient coordinates. We further study the behavior of text lines, namely, linearity and smoothness based on centroid location analysis, and the principal axis direction of each text line for classification. Experimental results on standard datasets, namely, ICDAR 2013 video, 2015 video, YVT video and our own data, show that the performances of text recognition methods are improved significantly after-classification compared to before-classification.","PeriodicalId":194844,"journal":{"name":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"New Tampered Features for Scene and Caption Text Classification in Video Frame\",\"authors\":\"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, C. Tan\",\"doi\":\"10.1109/ICFHR.2016.0020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The presence of both caption/graphics/superimposed and scene texts in video frames is the major cause for the poor accuracy of text recognition methods. This paper proposes an approach for identifying tampered information by analyzing the spatial distribution of DCT coefficients in a new way for classifying caption and scene text. Since caption text is edited/superimposed, which results in artificially created texts comparing to scene texts that exist naturally in frames. We exploit this fact to identify the presence of caption and scene texts in video frames based on the advantage of DCT coefficients. The proposed method analyzes the distributions of both zero and non-zero coefficients (only positive values) locally by moving a window, and studies histogram operations over each input text line image. This generates line graphs for respective zero and non-zero coefficient coordinates. We further study the behavior of text lines, namely, linearity and smoothness based on centroid location analysis, and the principal axis direction of each text line for classification. Experimental results on standard datasets, namely, ICDAR 2013 video, 2015 video, YVT video and our own data, show that the performances of text recognition methods are improved significantly after-classification compared to before-classification.\",\"PeriodicalId\":194844,\"journal\":{\"name\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFHR.2016.0020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2016.0020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

视频帧中同时存在字幕/图形/叠加文本和场景文本是导致文本识别方法准确性差的主要原因。本文提出了一种通过分析DCT系数的空间分布来识别篡改信息的方法,为标题和场景文本分类提供了一种新的方法。由于标题文本是编辑/叠加的,这导致人工创建的文本与自然存在于帧中的场景文本相比。基于DCT系数的优势,我们利用这一事实来识别视频帧中标题和场景文本的存在。该方法通过移动窗口局部分析零系数和非零系数(仅为正值)的分布,并研究对每个输入文本行图像的直方图操作。这将为各自的零系数和非零系数坐标生成线形图。我们进一步研究文本线的行为,即基于质心位置分析的线性度和平滑度,以及每条文本线的主轴方向进行分类。在标准数据集,即ICDAR 2013视频、2015视频、YVT视频和我们自己的数据上的实验结果表明,分类后的文本识别方法的性能比分类前有了明显的提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
New Tampered Features for Scene and Caption Text Classification in Video Frame
The presence of both caption/graphics/superimposed and scene texts in video frames is the major cause for the poor accuracy of text recognition methods. This paper proposes an approach for identifying tampered information by analyzing the spatial distribution of DCT coefficients in a new way for classifying caption and scene text. Since caption text is edited/superimposed, which results in artificially created texts comparing to scene texts that exist naturally in frames. We exploit this fact to identify the presence of caption and scene texts in video frames based on the advantage of DCT coefficients. The proposed method analyzes the distributions of both zero and non-zero coefficients (only positive values) locally by moving a window, and studies histogram operations over each input text line image. This generates line graphs for respective zero and non-zero coefficient coordinates. We further study the behavior of text lines, namely, linearity and smoothness based on centroid location analysis, and the principal axis direction of each text line for classification. Experimental results on standard datasets, namely, ICDAR 2013 video, 2015 video, YVT video and our own data, show that the performances of text recognition methods are improved significantly after-classification compared to before-classification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信