Data Preprocessing for Learning, Analyzing and Detecting Scene Text Video based on Rotational Gradient

IF 2.2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Data Pub Date : 2021-04-05 DOI:10.1145/3460620.3460621
Manasa Devi Mortha, S. Maddala, V. Raju
{"title":"Data Preprocessing for Learning, Analyzing and Detecting Scene Text Video based on Rotational Gradient","authors":"Manasa Devi Mortha, S. Maddala, V. Raju","doi":"10.1145/3460620.3460621","DOIUrl":null,"url":null,"abstract":"Challenging annotated video datasets are in huge demand for the researchers and embedded industrials to learn and build an artificial intelligence for detecting, localizing and classifying the objects of interest aimed at various applications under pattern recognition and computer vision domain. It is very significant to produce those annotated sets to the respective communal. This paper focuses on text as annotated data in video for detection, localization, tracking and classification to solve several optical character recognition (OCR) based problems. Text is very essential in understanding the nature of the video because of diverse applications which are in renowned today like video retrieval and searching, driverless cars, industrial goods automation, geocoding and many more. Hence, it is important to understand how to create, prepare and load datasets to make ready for the machine to learn and understand. First, we have applied bilateral filter to preserve the edge information. Then, rotational gradient approach is proposed to detect the text in variable viewpoints. Later, the combination of morphology and contours has applied to generate blobs with bounding box around the detected regions by eradicating quasi text areas. The simulation results have shown better performance than traditional techniques with better detection rate on ICDAR Robust Reading Competition on Text in Video 2013-15 datasets.","PeriodicalId":36824,"journal":{"name":"Data","volume":"89 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2021-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1145/3460620.3460621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Challenging annotated video datasets are in huge demand for the researchers and embedded industrials to learn and build an artificial intelligence for detecting, localizing and classifying the objects of interest aimed at various applications under pattern recognition and computer vision domain. It is very significant to produce those annotated sets to the respective communal. This paper focuses on text as annotated data in video for detection, localization, tracking and classification to solve several optical character recognition (OCR) based problems. Text is very essential in understanding the nature of the video because of diverse applications which are in renowned today like video retrieval and searching, driverless cars, industrial goods automation, geocoding and many more. Hence, it is important to understand how to create, prepare and load datasets to make ready for the machine to learn and understand. First, we have applied bilateral filter to preserve the edge information. Then, rotational gradient approach is proposed to detect the text in variable viewpoints. Later, the combination of morphology and contours has applied to generate blobs with bounding box around the detected regions by eradicating quasi text areas. The simulation results have shown better performance than traditional techniques with better detection rate on ICDAR Robust Reading Competition on Text in Video 2013-15 datasets.
基于旋转梯度的场景文本视频学习、分析和检测数据预处理
具有挑战性的注释视频数据集对研究人员和嵌入式行业有巨大的需求,以学习和构建用于检测,定位和分类感兴趣的对象的人工智能,针对模式识别和计算机视觉领域的各种应用。将这些标注集生成到各自的社区是非常重要的。本文将文本作为视频中的标注数据进行检测、定位、跟踪和分类,以解决若干基于光学字符识别(OCR)的问题。文本对于理解视频的性质是非常重要的,因为今天有各种各样的应用,如视频检索和搜索,无人驾驶汽车,工业产品自动化,地理编码等等。因此,了解如何创建、准备和加载数据集,为机器学习和理解做好准备是很重要的。首先,我们使用双边滤波器来保留边缘信息。然后,提出了旋转梯度方法来检测不同视点的文本。然后,将形态学和轮廓相结合,通过消除准文本区域,在检测区域周围生成带边界框的blobs。仿真结果表明,该方法在2013- 2015年视频数据集的ICDAR文本鲁棒阅读竞赛中具有比传统方法更好的性能和更高的检出率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data
Data Decision Sciences-Information Systems and Management
CiteScore
4.30
自引率
3.80%
发文量
0
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信