An Impact of YOLOv5 on Text Detection and Recognition System using TesseractOCR in Images/Video Frames

2022 IEEE International Conference on Data Science and Information System (ICDSIS) Pub Date : 2022-07-29 DOI:10.1109/ICDSIS55133.2022.9915927

Y. Chaitra, R. Dinesh, M. Jeevan, M. Arpitha, V. Aishwarya, K. Akshitha

{"title":"An Impact of YOLOv5 on Text Detection and Recognition System using TesseractOCR in Images/Video Frames","authors":"Y. Chaitra, R. Dinesh, M. Jeevan, M. Arpitha, V. Aishwarya, K. Akshitha","doi":"10.1109/ICDSIS55133.2022.9915927","DOIUrl":null,"url":null,"abstract":"Text detection and recognition in images and videos are significant research areas in computer vision. A computer vision technology is used for smart city real-time traffic monitoring, and a security camera can simultaneously record the license plate information of suspected vehicles. The challenging task here is detecting the text images that are arbitrary oriented, such as aerial photographs and scene texts. Most complementary text detection and recognition methods are designed to identify text in images that are clear in the background and near-horizontal text. However, those methods will not be effective in detecting text in complex images and video streams. To address this issue, we propose a system that detects the text images using the YOLOv5s model, which effectively trains small-scale images and YOLOv5x for largescale images. TesseractOCR recognizes the detected text by converting the image to a string and storing it in CSV format. The experiment was carried out for ICDAR2013, ICDAR2015, and YVT images/frames. The results indicate that the proposed method using YOLOv5x effectively detects images/video frames with reasonably good accuracy, and the recognition rate is suitable for a near-horizontal image using TesseractOCR.","PeriodicalId":178360,"journal":{"name":"2022 IEEE International Conference on Data Science and Information System (ICDSIS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Science and Information System (ICDSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSIS55133.2022.9915927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Text detection and recognition in images and videos are significant research areas in computer vision. A computer vision technology is used for smart city real-time traffic monitoring, and a security camera can simultaneously record the license plate information of suspected vehicles. The challenging task here is detecting the text images that are arbitrary oriented, such as aerial photographs and scene texts. Most complementary text detection and recognition methods are designed to identify text in images that are clear in the background and near-horizontal text. However, those methods will not be effective in detecting text in complex images and video streams. To address this issue, we propose a system that detects the text images using the YOLOv5s model, which effectively trains small-scale images and YOLOv5x for largescale images. TesseractOCR recognizes the detected text by converting the image to a string and storing it in CSV format. The experiment was carried out for ICDAR2013, ICDAR2015, and YVT images/frames. The results indicate that the proposed method using YOLOv5x effectively detects images/video frames with reasonably good accuracy, and the recognition rate is suitable for a near-horizontal image using TesseractOCR.

查看原文本刊更多论文

YOLOv5对基于TesseractOCR的图像/视频帧文本检测与识别系统的影响

图像和视频中的文本检测与识别是计算机视觉的重要研究领域。智能城市实时交通监控采用计算机视觉技术，安全摄像头可同时记录可疑车辆的车牌信息。这里的挑战性任务是检测任意方向的文本图像，如航拍照片和场景文本。大多数互补文本检测和识别方法都是为了识别背景清晰和接近水平的图像中的文本而设计的。然而，这些方法在检测复杂图像和视频流中的文本时并不有效。为了解决这个问题，我们提出了一个使用YOLOv5s模型检测文本图像的系统，该模型有效地训练小规模图像和YOLOv5x用于大规模图像。TesseractOCR通过将图像转换为字符串并将其存储为CSV格式来识别检测到的文本。实验分别针对ICDAR2013、ICDAR2015和YVT图像/帧进行。结果表明，基于YOLOv5x的方法可以有效地检测图像/视频帧，具有较好的准确率，并且对于使用TesseractOCR的近水平图像具有较好的识别率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Data Science and Information System (ICDSIS)

自引率

0.00%

发文量