{"title":"Document Image Quality Assessment Using Discriminative Sparse Representation","authors":"Xujun Peng, Huaigu Cao, P. Natarajan","doi":"10.1109/DAS.2016.24","DOIUrl":null,"url":null,"abstract":"The goal of document image quality assessment (DIQA) is to build a computational model which can predict the degree of degradation for document images. Based on the estimated quality scores, the immediate feedback can be provided by document processing and analysis systems, which helps to maintain, organize, recognize and retrieve the information from document images. Recently, the bag-of-visual-words (BoV) based approaches have gained increasing attention from researchers to fulfill the task of quality assessment, but how to use BoV to represent images more accurately is still a challenging problem. In this paper, we propose to utilize a sparse representation based method to estimate document image's quality with respect to the OCR capability. Unlike the conventional sparse representation approaches, we introduce the target quality scores into the training phase of sparse representation. The proposed method improves the discriminability of the system and ensures the obtained codebook is more suitable for our assessment task. The experimental results on a public dataset show that the proposed method outperforms other hand-crafted and BoV based DIQA approaches.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2016.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
The goal of document image quality assessment (DIQA) is to build a computational model which can predict the degree of degradation for document images. Based on the estimated quality scores, the immediate feedback can be provided by document processing and analysis systems, which helps to maintain, organize, recognize and retrieve the information from document images. Recently, the bag-of-visual-words (BoV) based approaches have gained increasing attention from researchers to fulfill the task of quality assessment, but how to use BoV to represent images more accurately is still a challenging problem. In this paper, we propose to utilize a sparse representation based method to estimate document image's quality with respect to the OCR capability. Unlike the conventional sparse representation approaches, we introduce the target quality scores into the training phase of sparse representation. The proposed method improves the discriminability of the system and ensures the obtained codebook is more suitable for our assessment task. The experimental results on a public dataset show that the proposed method outperforms other hand-crafted and BoV based DIQA approaches.