{"title":"视频文本检测与跟踪系统","authors":"Tuoerhongjiang Yusufu, Yiqing Wang, Xiangzhong Fang","doi":"10.1109/ISM.2013.106","DOIUrl":null,"url":null,"abstract":"Faced with the increasing large scale video databases, retrieving videos quickly and efficiently has become a crucial problem. Video text, which carries high level semantic information, is a type of important source that is useful for this task. In this paper, we introduce a video text detecting and tracking approach. By these methods we can obtain clear binary text images, and these text images can be processed by OCR (Optical Character Recognition) software directly. Our approach including two parts, one is stroke-model based video text detection and localization method, the other is SURF (Speeded Up Robust Features) based text region tracking method. In our detection and localization approach, we use stroke model and morphological operation to roughly identify candidate text regions. Combine stroke-map and edge response to localize text lines in each candidate text regions. Several heuristics and SVM (Support Vector Machine) used to verifying text blocks. The core part of our text tracking method is fast approximate nearest-neighbour search algorithm for extracted SURF features. Text-ending frame is determined based on SURF feature point numbers, while, text motion estimation is based on correct matches in adjacent frames. Experimental result on large number of different video clips shows that our approach can effectively detect and track both static texts and scrolling texts.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"25 1","pages":"522-529"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"A Video Text Detection and Tracking System\",\"authors\":\"Tuoerhongjiang Yusufu, Yiqing Wang, Xiangzhong Fang\",\"doi\":\"10.1109/ISM.2013.106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Faced with the increasing large scale video databases, retrieving videos quickly and efficiently has become a crucial problem. Video text, which carries high level semantic information, is a type of important source that is useful for this task. In this paper, we introduce a video text detecting and tracking approach. By these methods we can obtain clear binary text images, and these text images can be processed by OCR (Optical Character Recognition) software directly. Our approach including two parts, one is stroke-model based video text detection and localization method, the other is SURF (Speeded Up Robust Features) based text region tracking method. In our detection and localization approach, we use stroke model and morphological operation to roughly identify candidate text regions. Combine stroke-map and edge response to localize text lines in each candidate text regions. Several heuristics and SVM (Support Vector Machine) used to verifying text blocks. The core part of our text tracking method is fast approximate nearest-neighbour search algorithm for extracted SURF features. Text-ending frame is determined based on SURF feature point numbers, while, text motion estimation is based on correct matches in adjacent frames. Experimental result on large number of different video clips shows that our approach can effectively detect and track both static texts and scrolling texts.\",\"PeriodicalId\":6311,\"journal\":{\"name\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"volume\":\"25 1\",\"pages\":\"522-529\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISM.2013.106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2013.106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Faced with the increasing large scale video databases, retrieving videos quickly and efficiently has become a crucial problem. Video text, which carries high level semantic information, is a type of important source that is useful for this task. In this paper, we introduce a video text detecting and tracking approach. By these methods we can obtain clear binary text images, and these text images can be processed by OCR (Optical Character Recognition) software directly. Our approach including two parts, one is stroke-model based video text detection and localization method, the other is SURF (Speeded Up Robust Features) based text region tracking method. In our detection and localization approach, we use stroke model and morphological operation to roughly identify candidate text regions. Combine stroke-map and edge response to localize text lines in each candidate text regions. Several heuristics and SVM (Support Vector Machine) used to verifying text blocks. The core part of our text tracking method is fast approximate nearest-neighbour search algorithm for extracted SURF features. Text-ending frame is determined based on SURF feature point numbers, while, text motion estimation is based on correct matches in adjacent frames. Experimental result on large number of different video clips shows that our approach can effectively detect and track both static texts and scrolling texts.