DAR '12最新文献

筛选
英文 中文
Bangla date field extraction in offline handwritten documents 脱机手写文档中的孟加拉语日期字段提取
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432561
Ranju Mandal, P. Roy, U. Pal
{"title":"Bangla date field extraction in offline handwritten documents","authors":"Ranju Mandal, P. Roy, U. Pal","doi":"10.1145/2432553.2432561","DOIUrl":"https://doi.org/10.1145/2432553.2432561","url":null,"abstract":"Date is a useful information for various application (e.g. date wise document indexing) and automatic extraction of date information involves difficult challenges due to writing styles of different individuals, touching characters and confusion among identification of numerals, punctuation and texts. In this paper, we present a framework for indexing/retrieval of Bangla date patterns from handwritten documents. The method first classifies word components of each text line into month and non-month class using word level feature. Next, non-month words are segmented into individual components and classified into one of text, digit or punctuation. Using this information of word and character level components, the date patterns are searched. First using voting approach and then using regular expression we detect the candidate lines for numeric and semi-numeric date. Dynamic Time Warping (DTW) matching of profile based features is used for classification of month/non-month words. Numerals and punctuations are classified using gradient based feature and SVM classifier. The experiment is performed on Bangla handwritten dataset and the results demonstrate the effectiveness of the proposed system.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115860443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A data acquisition and analysis system for palm leaf documents in Telugu 泰卢固语棕榈叶文献数据采集与分析系统
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432578
P. N. Sastry, R. Krishnan
{"title":"A data acquisition and analysis system for palm leaf documents in Telugu","authors":"P. N. Sastry, R. Krishnan","doi":"10.1145/2432553.2432578","DOIUrl":"https://doi.org/10.1145/2432553.2432578","url":null,"abstract":"This paper briefly reviews the progress in the field of hand written character recognition (HWCR) applied to the Indian languages with a special emphasis on the palm leaf character recognition (PLCR) techniques. The various methodologies and techniques for character recognition (CR) have been discussed in the paper. HWCR applied to historical documents like Palm leaves and old hand written manuscripts is much more challenging due to the limited progress in this area. These documents containing texts and treaties on a host of subjects are of both national and historical importance. Characters on the palm leaf have the additional properties like depth, an added feature which can be gainfully exploited during Palm Leaf Character Recognition (PLCR). The unique method of data collection initiated with isolated Telugu characters from palm leaf manuscripts, and the building of the palm leaf character database is described in this paper. A comparative analysis of the results for PLCR obtained by various techniques are also presented.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"220 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115129845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Benchmarking recognition results on camera captured word image data sets 对相机捕获的文字图像数据集的识别结果进行基准测试
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432572
D. Kumar, M. Prasad, A. Ramakrishnan
{"title":"Benchmarking recognition results on camera captured word image data sets","authors":"D. Kumar, M. Prasad, A. Ramakrishnan","doi":"10.1145/2432553.2432572","DOIUrl":"https://doi.org/10.1145/2432553.2432572","url":null,"abstract":"We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature. The benchmark word recognition rates obtained on ICDAR 2003, Sign evaluation, Street view, Born-digital and ICDAR 2011 data sets are 83.9%, 89.3%, 79.6%, 88.5% and 86.7%, respectively. The results obtained from MAPS binarized word images without the use of any lexicon are 64.5% and 71.7% for ICDAR 2003 and 2011 respectively, and these values are higher than the best reported values in the literature of 61.1% and 41.2%, respectively. MAPS results of 82.8% for BDI 2011 dataset matches the performance of the state of the art method based on power law transform.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116629334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Assamese online handwritten digit recognition system using hidden Markov models 使用隐马尔可夫模型的阿萨姆在线手写数字识别系统
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432573
G. S. Reddy, Bandita Sarma, R. Naik, S. Prasanna, C. Mahanta
{"title":"Assamese online handwritten digit recognition system using hidden Markov models","authors":"G. S. Reddy, Bandita Sarma, R. Naik, S. Prasanna, C. Mahanta","doi":"10.1145/2432553.2432573","DOIUrl":"https://doi.org/10.1145/2432553.2432573","url":null,"abstract":"This work describes the development of Assamese online handwritten digit recognition system. Assamese numerals are the same as the Bangla numerals. A large database of handwritten numerals is collected and partitioned into two parts of equal size. The first part is used for developing the Hidden Markov Models (HMM) based digit models. The (x, y) coordinates and their first and second time derivatives are used as features. The second part of the database is tested against the models to evaluate the performance. The digit recognition system provides an average recognition performance of 96.02%. A large amount of confusion is observed among the numerals 5 & 6. The new distance feature is used as an additional feature and the models are retrained. The performance for numeral 5 & 6 increases from 91.60% & 95.40% to 95.30% & 94.90%. As a result, the confusion reduces significantly and the average recognition performance increases to 97.14%.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114421119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Offline handwritten word recognition in Hindi 脱机手写词识别在印地语
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432563
R. Sitaram, Shrang Jain, Hariharan Ravishankar
{"title":"Offline handwritten word recognition in Hindi","authors":"R. Sitaram, Shrang Jain, Hariharan Ravishankar","doi":"10.1145/2432553.2432563","DOIUrl":"https://doi.org/10.1145/2432553.2432563","url":null,"abstract":"This paper discusses the Hindi offline handwritten word recognizer (HWR) that we are developing. For the purpose of training and testing the offline HWR, we have created a Hindi handwritten word and character database from 100 writers. In our HWR we use two-pass Dynamic Programming algorithm to match the test word against each word in the lexicon by initially segmenting the test word image into probable characters. We extract directional element features (DEF) on each character image segment and statistically model them. Currently we are achieving word recognition accuracies of 91.23% to 79.94% on 10 to 30 vocabulary words.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124502440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Development of an Assamese OCR using Bangla OCR 使用孟加拉语OCR开发阿萨姆语OCR
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432566
Subhankar Ghosh, P. Bora, Sanjib Das, B. Chaudhuri
{"title":"Development of an Assamese OCR using Bangla OCR","authors":"Subhankar Ghosh, P. Bora, Sanjib Das, B. Chaudhuri","doi":"10.1145/2432553.2432566","DOIUrl":"https://doi.org/10.1145/2432553.2432566","url":null,"abstract":"This paper refers to the development of an OCR for the Assamese language by modifying an existing OCR for the Bangla language. This modification is feasible because the Assamese script is similar, except for a few characters, to the Bangla script. The OCR incorporates a two stage recognizer using SVM classifier with no post-processing. A spell-checker capable of detecting most errors and interactively recommending some corrections is implemented. The OCR is tested with about 1800 pages of good quality printed documents. The accuracy achieved is about 97%.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122517795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Line segmentation of handwritten Gurmukhi manuscripts Gurmukhi手写体手稿的线段分割
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432568
S. Jindal, Gurpreet Singh Lehal
{"title":"Line segmentation of handwritten Gurmukhi manuscripts","authors":"S. Jindal, Gurpreet Singh Lehal","doi":"10.1145/2432553.2432568","DOIUrl":"https://doi.org/10.1145/2432553.2432568","url":null,"abstract":"The development of an OCR system for recognition of old Gurmukhi handwritten manuscripts is a complex task involving many difficulties. Historical documents are affected by problems of ageing and repeated use and many other uncontrollable factors. Segmentation is one of the important phase of an OCR, as accuracy of an OCR depends upon the accuracy of segmentation. The writing styles of historical documents make the activity of segmentation extremely difficult. Segmentation includes line, word and character segmentation. In this paper, we have discussed a method for segmenting lines for Gurmukhi handwritten manuscripts.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121241451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A syntactic PR approach to Telugu handwritten character recognition 泰卢固语手写字符识别的句法PR方法
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432579
Samita Pradhan, A. Negi
{"title":"A syntactic PR approach to Telugu handwritten character recognition","authors":"Samita Pradhan, A. Negi","doi":"10.1145/2432553.2432579","DOIUrl":"https://doi.org/10.1145/2432553.2432579","url":null,"abstract":"This paper shows a character recognition mechanism based on a syntactic PR approach that uses the trie data structure for efficient recognition. It uses approximate matching of the string for classification. During the preprocessing an input character image is transformed into a skeletonized image and discrete curves are found using a 3 x 3 pixel region. A trie, which we call as a sequence trie is used for a look up approach at a lower level to encode a discrete curve pattern of pixels. The sequence of such discrete curves from the input pattern is looked up in the sequence trie. The encoding of several such sequence numbers for the thinned character constructs a pattern string. Approximate string matching is used to compare the encoded pattern string from a template character with the pattern string obtained from the input character. We consider the approximate matching of the string instead of the exact matching to make the approach robust in the presence of noise. Another trie data structure (called pattern trie) is used for the efficient storage and retrieval for approximate matching of the string. We make use of the trie since it takes O(m) in worst case where m is the length of the longest string in the trie. For the approximate string matching we use look ahead with a branch and bound scheme in the trie. Here we apply our method on 43 Telugu characters from the basic Telugu characters for demonstration. The proposed approach has recognised all the test characters given here correctly, however more extensive testing on realistic data is required.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129226446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
An empirical intrinsic mode based characterization of Indian scripts 基于经验内在模式的印度文字表征
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432575
Kavita Bhardwaj, S. Chaudhury, Sumantra Dutta Roy
{"title":"An empirical intrinsic mode based characterization of Indian scripts","authors":"Kavita Bhardwaj, S. Chaudhury, Sumantra Dutta Roy","doi":"10.1145/2432553.2432575","DOIUrl":"https://doi.org/10.1145/2432553.2432575","url":null,"abstract":"In this paper, we describe a novel technique for Document script identification(DSI) from printed documents, using Empirical Mode Decomposition (EMD). The intrinsic decomposition nature can adaptively decompose script images into a series of modes representing different local features of script images. In this method, Radon transformed script images are decomposed into finite set of IMFs (Intrinsic Mode Functions). The energy concentration in a particular orientation characterises a script texture as it indicates the dominance of individual script in that direction. We demonstrate how the proposed method use these IMFs as feature vectors to distinguish various scripts.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126513387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recognition of Kannada characters extracted from scene images 从场景图像中提取卡纳达语字符的识别
DAR '12 Pub Date : 2012-12-16 DOI: 10.1145/2432553.2432557
D. Kumar, A. Ramakrishnan
{"title":"Recognition of Kannada characters extracted from scene images","authors":"D. Kumar, A. Ramakrishnan","doi":"10.1145/2432553.2432557","DOIUrl":"https://doi.org/10.1145/2432553.2432557","url":null,"abstract":"In this paper, we describe a method for feature extraction and classification of characters manually isolated from scene or natural images. Characters in a scene image may be affected by low resolution, uneven illumination or occlusion. We propose a novel method to perform binarization on gray scale images by minimizing energy functional. Discrete Cosine Transform and Angular Radial Transform are used to extract the features from characters after normalization for scale and translation. We have evaluated our method on the complete test set of Chars74k dataset for English and Kannada scripts consisting of handwritten and synthesized characters, as well as characters extracted from camera captured images. We utilize only synthesized and handwritten characters from this dataset as training set. Nearest neighbor classification is used in our experiments.","PeriodicalId":410986,"journal":{"name":"DAR '12","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122268114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信