{"title":"基于多尺度笔划的页面分割方法","authors":"Mehdi Felhi, S. Tabbone, Maria V. Ortiz Segovia","doi":"10.1109/DAS.2014.68","DOIUrl":null,"url":null,"abstract":"In this paper we present a new hybrid page segmentation approach based on connected component and region analysis. We first describe our stroke descriptor that detects text and line component candidates using the skeleton of the binarized document image. Then, an active contour model is applied to segment the rest of the image into photo and background regions. This classification is verified by studying the variation of each detected region. Finally, we cluster the text candidates using mean-shift analysis technique according to their corresponding sizes and we present our adaptive projection profile approach to gather separately horizontal and vertical text regions. The method is applied for segmenting realistic scanned document images (newspapers and magazines) that contain text, lines and photo regions. We evaluate the performances of our approach by comparing it to the existing methods that participated in ICDAR page segmentation competition.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Multiscale Stroke-Based Page Segmentation Approach\",\"authors\":\"Mehdi Felhi, S. Tabbone, Maria V. Ortiz Segovia\",\"doi\":\"10.1109/DAS.2014.68\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present a new hybrid page segmentation approach based on connected component and region analysis. We first describe our stroke descriptor that detects text and line component candidates using the skeleton of the binarized document image. Then, an active contour model is applied to segment the rest of the image into photo and background regions. This classification is verified by studying the variation of each detected region. Finally, we cluster the text candidates using mean-shift analysis technique according to their corresponding sizes and we present our adaptive projection profile approach to gather separately horizontal and vertical text regions. The method is applied for segmenting realistic scanned document images (newspapers and magazines) that contain text, lines and photo regions. We evaluate the performances of our approach by comparing it to the existing methods that participated in ICDAR page segmentation competition.\",\"PeriodicalId\":220495,\"journal\":{\"name\":\"2014 11th IAPR International Workshop on Document Analysis Systems\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 11th IAPR International Workshop on Document Analysis Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2014.68\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th IAPR International Workshop on Document Analysis Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2014.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper we present a new hybrid page segmentation approach based on connected component and region analysis. We first describe our stroke descriptor that detects text and line component candidates using the skeleton of the binarized document image. Then, an active contour model is applied to segment the rest of the image into photo and background regions. This classification is verified by studying the variation of each detected region. Finally, we cluster the text candidates using mean-shift analysis technique according to their corresponding sizes and we present our adaptive projection profile approach to gather separately horizontal and vertical text regions. The method is applied for segmenting realistic scanned document images (newspapers and magazines) that contain text, lines and photo regions. We evaluate the performances of our approach by comparing it to the existing methods that participated in ICDAR page segmentation competition.