{"title":"Background Line Detection with A Stochastic Model","authors":"Yefeng Zheng, Huiping Li, D. Doermann","doi":"10.1109/CVPRW.2003.10029","DOIUrl":null,"url":null,"abstract":"Background lines often exist in textual documents. It is important to detect and remove those lines so text can be easily segmented and recognized. A stochastic model is proposed in this paper which incorporates the high level contextual information to detect severely broken lines. We observed that 1) background lines are parallel, and 2) the vertical gaps between any two neighboring lines are roughly equal with small variance. The novelty of our algorithm is we use a HMM model to model the projection profile along the estimated skew angle, and estimate the optimal positions of all background lines simultaneously based on the Viterbi algorithm. Compared with our previous deterministic model based approach [15], the new method is much more robust and detects about 96.8% background lines correctly in our Arabic document database.","PeriodicalId":121249,"journal":{"name":"2003 Conference on Computer Vision and Pattern Recognition Workshop","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2003 Conference on Computer Vision and Pattern Recognition Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2003.10029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Background lines often exist in textual documents. It is important to detect and remove those lines so text can be easily segmented and recognized. A stochastic model is proposed in this paper which incorporates the high level contextual information to detect severely broken lines. We observed that 1) background lines are parallel, and 2) the vertical gaps between any two neighboring lines are roughly equal with small variance. The novelty of our algorithm is we use a HMM model to model the projection profile along the estimated skew angle, and estimate the optimal positions of all background lines simultaneously based on the Viterbi algorithm. Compared with our previous deterministic model based approach [15], the new method is much more robust and detects about 96.8% background lines correctly in our Arabic document database.