自然场景图像和视频的文本定位与脚本识别

Chandana Udupa, Anusha Upadhyaya, Basanagoud S. Patil, S. Seeri, Prakashgoud Patil, P. Hiremath
{"title":"自然场景图像和视频的文本定位与脚本识别","authors":"Chandana Udupa, Anusha Upadhyaya, Basanagoud S. Patil, S. Seeri, Prakashgoud Patil, P. Hiremath","doi":"10.1109/CSI54720.2022.9924044","DOIUrl":null,"url":null,"abstract":"Text detection and its script identification in a natural scene image/video has attracted the attention of many researchers over the recent years due to its application in the de-sign of computer vision devices for usage by the visually impaired people, global tourists travelling in unfamiliar tourist places, etc. to facilitate them to understand the textual information displayed on sign boards, bill boards, public notice boards, etc., the objective of the proposed method is detection and localization of multilingual text in a natural scene video image and its corresponding script identification. The texts in three languages, namely, English, Hindi and Kannada, are considered. In the proposed method, CNN based YOLOv5 is used for text detection and localization in real-time videos of natural scene and it is also trained for script identification. The YOLOv5 performance is found to yield an accuracy higher than otherobject detection algorithms. The proposed model is trained witha custom dataset containing video images of natural scenes and istested for different scenarios like texts in different backgrounds, fonts, orientations, resolutions, and disturbances in the images. The experimental results demonstrate the effectiveness and robustness of the proposed method. The performance comparison is done with other methods in the literature.","PeriodicalId":221137,"journal":{"name":"2022 International Conference on Connected Systems & Intelligence (CSI)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text Localization and Script Identification in Natural Scene Images and Videos\",\"authors\":\"Chandana Udupa, Anusha Upadhyaya, Basanagoud S. Patil, S. Seeri, Prakashgoud Patil, P. Hiremath\",\"doi\":\"10.1109/CSI54720.2022.9924044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text detection and its script identification in a natural scene image/video has attracted the attention of many researchers over the recent years due to its application in the de-sign of computer vision devices for usage by the visually impaired people, global tourists travelling in unfamiliar tourist places, etc. to facilitate them to understand the textual information displayed on sign boards, bill boards, public notice boards, etc., the objective of the proposed method is detection and localization of multilingual text in a natural scene video image and its corresponding script identification. The texts in three languages, namely, English, Hindi and Kannada, are considered. In the proposed method, CNN based YOLOv5 is used for text detection and localization in real-time videos of natural scene and it is also trained for script identification. The YOLOv5 performance is found to yield an accuracy higher than otherobject detection algorithms. The proposed model is trained witha custom dataset containing video images of natural scenes and istested for different scenarios like texts in different backgrounds, fonts, orientations, resolutions, and disturbances in the images. The experimental results demonstrate the effectiveness and robustness of the proposed method. The performance comparison is done with other methods in the literature.\",\"PeriodicalId\":221137,\"journal\":{\"name\":\"2022 International Conference on Connected Systems & Intelligence (CSI)\",\"volume\":\"250 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Connected Systems & Intelligence (CSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSI54720.2022.9924044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Connected Systems & Intelligence (CSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSI54720.2022.9924044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,自然场景图像/视频中的文本检测及其文字识别技术被广泛应用于视障人士、在陌生旅游地点旅游的全球游客等使用的计算机视觉设备的设计,以方便他们理解广告牌、广告牌、公共布告栏等显示的文字信息,引起了许多研究者的关注。该方法的目标是对自然场景视频图像中的多语言文本进行检测和定位,并进行相应的脚本识别。审议了英语、印地语和卡纳达语三种语文的案文。在本文提出的方法中,利用基于CNN的YOLOv5对自然场景实时视频进行文本检测和定位,并对其进行脚本识别训练。YOLOv5性能被发现产生比其他目标检测算法更高的精度。该模型使用包含自然场景视频图像的自定义数据集进行训练,并针对不同场景(如不同背景、字体、方向、分辨率和图像中的干扰)列出不同的文本。实验结果证明了该方法的有效性和鲁棒性。并与文献中其他方法进行了性能比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Text Localization and Script Identification in Natural Scene Images and Videos
Text detection and its script identification in a natural scene image/video has attracted the attention of many researchers over the recent years due to its application in the de-sign of computer vision devices for usage by the visually impaired people, global tourists travelling in unfamiliar tourist places, etc. to facilitate them to understand the textual information displayed on sign boards, bill boards, public notice boards, etc., the objective of the proposed method is detection and localization of multilingual text in a natural scene video image and its corresponding script identification. The texts in three languages, namely, English, Hindi and Kannada, are considered. In the proposed method, CNN based YOLOv5 is used for text detection and localization in real-time videos of natural scene and it is also trained for script identification. The YOLOv5 performance is found to yield an accuracy higher than otherobject detection algorithms. The proposed model is trained witha custom dataset containing video images of natural scenes and istested for different scenarios like texts in different backgrounds, fonts, orientations, resolutions, and disturbances in the images. The experimental results demonstrate the effectiveness and robustness of the proposed method. The performance comparison is done with other methods in the literature.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信