{"title":"一种基于边缘的视频文本提取方法","authors":"Shi Jianyong, Luo Xiling, Zhang Jun","doi":"10.1109/ICCTD.2009.177","DOIUrl":null,"url":null,"abstract":"Text in video is a compact but effective clue for video indexing and summarization. In this paper, we propose an edge-based video text extraction approach with low computation, which can automatically detect and extract text from complex video frames. We first detect the edge maps of both an intensity image and its binarized image, and merge the two into one edge map, which contains less edge pixels of background but enriched edge pixels of text. Then, the projection profile method is used to evaluate the distribution of the resulting edge map in both horizontal and vertical directions. In both directions, an adaptive thresholding method is applied to identify adjacent pixel rows and columns which contain text. The intersections of these rows and columns are extracted as text regions. Finally, a novel extraction method based on monochromatism of text is applied to the regions. The output of the extraction method can be directly fed to OCR. The performance of our approach is demonstrated by presenting experimental results for a set of video clips and static images.","PeriodicalId":269403,"journal":{"name":"2009 International Conference on Computer Technology and Development","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2009-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"An Edge-Based Approach for Video Text Extraction\",\"authors\":\"Shi Jianyong, Luo Xiling, Zhang Jun\",\"doi\":\"10.1109/ICCTD.2009.177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text in video is a compact but effective clue for video indexing and summarization. In this paper, we propose an edge-based video text extraction approach with low computation, which can automatically detect and extract text from complex video frames. We first detect the edge maps of both an intensity image and its binarized image, and merge the two into one edge map, which contains less edge pixels of background but enriched edge pixels of text. Then, the projection profile method is used to evaluate the distribution of the resulting edge map in both horizontal and vertical directions. In both directions, an adaptive thresholding method is applied to identify adjacent pixel rows and columns which contain text. The intersections of these rows and columns are extracted as text regions. Finally, a novel extraction method based on monochromatism of text is applied to the regions. The output of the extraction method can be directly fed to OCR. The performance of our approach is demonstrated by presenting experimental results for a set of video clips and static images.\",\"PeriodicalId\":269403,\"journal\":{\"name\":\"2009 International Conference on Computer Technology and Development\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Computer Technology and Development\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCTD.2009.177\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Computer Technology and Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCTD.2009.177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text in video is a compact but effective clue for video indexing and summarization. In this paper, we propose an edge-based video text extraction approach with low computation, which can automatically detect and extract text from complex video frames. We first detect the edge maps of both an intensity image and its binarized image, and merge the two into one edge map, which contains less edge pixels of background but enriched edge pixels of text. Then, the projection profile method is used to evaluate the distribution of the resulting edge map in both horizontal and vertical directions. In both directions, an adaptive thresholding method is applied to identify adjacent pixel rows and columns which contain text. The intersections of these rows and columns are extracted as text regions. Finally, a novel extraction method based on monochromatism of text is applied to the regions. The output of the extraction method can be directly fed to OCR. The performance of our approach is demonstrated by presenting experimental results for a set of video clips and static images.