基于网络流的统一视频文本检测方法

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI:10.1109/ICDAR.2017.62

Xue-Hang Yang, Wenhao He, Fei Yin, Cheng-Lin Liu

{"title":"基于网络流的统一视频文本检测方法","authors":"Xue-Hang Yang, Wenhao He, Fei Yin, Cheng-Lin Liu","doi":"10.1109/ICDAR.2017.62","DOIUrl":null,"url":null,"abstract":"Scene text detection in videos has many application needs but has drawn less attention than that in images. Existing methods for video text detection perform unsatisfactorily because of the insufficient utilization of spatial and temporal information. In this paper, we propose a novel video text detection method with network flow based tracking. The system first applies a newly proposed Fully Convolutional Neural Network (FCN) based scene text detection method to detect texts in individual frames and then track proposals in adjacent frames with a motion-based method. Next, the text association problem is formulated into a cost-flow network and text trajectories are derived from the network with a min-cost flow algorithm. At last, the trajectories are post-processed to improve the precision accuracy. The method can detect multi-oriented scene text in videos and incorporate spatial and temporal information efficiently. Experimental results show that the method improves the detection performance remarkably on benchmark datasets, e.g., by a 15.66% increase of ATA Average Tracking Accuracy) on ICDAR video scene text dataset.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Unified Video Text Detection Method with Network Flow\",\"authors\":\"Xue-Hang Yang, Wenhao He, Fei Yin, Cheng-Lin Liu\",\"doi\":\"10.1109/ICDAR.2017.62\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene text detection in videos has many application needs but has drawn less attention than that in images. Existing methods for video text detection perform unsatisfactorily because of the insufficient utilization of spatial and temporal information. In this paper, we propose a novel video text detection method with network flow based tracking. The system first applies a newly proposed Fully Convolutional Neural Network (FCN) based scene text detection method to detect texts in individual frames and then track proposals in adjacent frames with a motion-based method. Next, the text association problem is formulated into a cost-flow network and text trajectories are derived from the network with a min-cost flow algorithm. At last, the trajectories are post-processed to improve the precision accuracy. The method can detect multi-oriented scene text in videos and incorporate spatial and temporal information efficiently. Experimental results show that the method improves the detection performance remarkably on benchmark datasets, e.g., by a 15.66% increase of ATA Average Tracking Accuracy) on ICDAR video scene text dataset.\",\"PeriodicalId\":433676,\"journal\":{\"name\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2017.62\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

视频中的场景文本检测有很多应用需求，但却没有图像中的场景文本检测那么受关注。现有的视频文本检测方法由于对空间和时间信息的利用不足，效果不理想。本文提出了一种基于网络流跟踪的视频文本检测方法。该系统首先采用新提出的基于全卷积神经网络(FCN)的场景文本检测方法检测单个帧中的文本，然后使用基于运动的方法跟踪相邻帧中的提案。其次，将文本关联问题构建为成本流网络，并利用最小成本流算法从网络中导出文本轨迹。最后，对轨迹进行后处理，提高定位精度。该方法可以检测视频中多方向的场景文本，并能有效地融合时空信息。实验结果表明，该方法在基准数据集上的检测性能得到了显著提高，在ICDAR视频场景文本数据集上的ATA平均跟踪精度提高了15.66%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Unified Video Text Detection Method with Network Flow

Scene text detection in videos has many application needs but has drawn less attention than that in images. Existing methods for video text detection perform unsatisfactorily because of the insufficient utilization of spatial and temporal information. In this paper, we propose a novel video text detection method with network flow based tracking. The system first applies a newly proposed Fully Convolutional Neural Network (FCN) based scene text detection method to detect texts in individual frames and then track proposals in adjacent frames with a motion-based method. Next, the text association problem is formulated into a cost-flow network and text trajectories are derived from the network with a min-cost flow algorithm. At last, the trajectories are post-processed to improve the precision accuracy. The method can detect multi-oriented scene text in videos and incorporate spatial and temporal information efficiently. Experimental results show that the method improves the detection performance remarkably on benchmark datasets, e.g., by a 15.66% increase of ATA Average Tracking Accuracy) on ICDAR video scene text dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量