Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)最新文献_第8页

Iterated Document Content Classification 迭代文档内容分类

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.148

Chang An, H. Baird, Pingping Xiu

{"title":"Iterated Document Content Classification","authors":"Chang An, H. Baird, Pingping Xiu","doi":"10.1109/ICDAR.2007.148","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.148","url":null,"abstract":"We report an improved methodology for training classifiers for document image content extraction, that is, the location and segmentation of regions containing handwriting, machine-printed text, photographs, blank space, etc. Our previous methods classified each individual pixel separately (rather than regions): this avoids the arbitrariness and restrictiveness that result from constraining region shapes (to, e.g., rectangles). However, this policy also allows content classes to vary frequently within small regions, often yielding areas where several content classes are mixed together. This does not reflect the way that real content is organized: typically almost all small local regions are of uniform class. This observation suggested a post-classification methodology which enforces local uniformity without imposing a restricted class of region shapes. We choose features extracted from small local regions (e.g. 4-5 pixels radius) with which we train classifiers that operate on the output of previous classifiers, guided by ground truth. This provides a sequence of post-classifiers, each trained separately on the results of the previous classifier. Experiments on a highly diverse test set of 83 document images show that this method reduces per-pixel classification errors by 23%, and it dramatically increases the occurrence of large contiguous regions of uniform class, thus providing highly usable near-solid 'masks' with which to segment the images into distinct classes. It continues to allow a wide range of complex, non-rectilinear region shapes.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130404835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Skew Detection for Chinese Handwriting by Horizontal Stroke Histogram 基于水平笔画直方图的汉字书写歪斜检测

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.233

Tong-Hua Su, Tian-Wen Zhang, Hu-Jie Huang, Yu Zhou

引用次数: 22

Deriving Symbol Dependent Edit Weights for Text Correction_The Use of Error Dictionaries 为文本纠错派生与符号相关的编辑权重——错误字典的使用

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.99

Christoph Ringlstetter, Ulrich Reffle, Annette Gotscharek, K. Schulz

{"title":"Deriving Symbol Dependent Edit Weights for Text Correction_The Use of Error Dictionaries","authors":"Christoph Ringlstetter, Ulrich Reffle, Annette Gotscharek, K. Schulz","doi":"10.1109/ICDAR.2007.99","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.99","url":null,"abstract":"Most systems for correcting errors in texts make use of specific word distance measures such as the Levenshtein distance. In many experiments it has been shown that correction accuracy is improved when using edit weights that depend on the particular symbols of the edit operation. However, most proposed approaches so far rely on high amounts of training data where errors and their corrections are collected. In practice, the preparation of suitable ground truth data is often too costly, which means that uniform edit costs are used. In this paper we evaluate approaches for deriving symbol dependent edit weights that do not need any ground truth training data, comparing them with methods based on ground truth training. We suggest a new approach where special error dictionaries are used to estimate weights. The method is simple and very efficient, needing one pass of the document to be corrected. Our experiments with different OCR systems and textual data show that the method consistently improves correction accuracy in a significant way, often leading to results comparable to those achieved with ground truth training.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130971510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Handwritten Chinese Character Recognition Using Modified LDA and Kernel FDA 基于改进LDA和核FDA的手写汉字识别

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.128

Duanduan Yang, Lianwen Jin

引用次数: 3

Combination of OCR Engines for Page Segmentation Based on Performance Evaluation 基于性能评估的组合OCR引擎页面分割

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.83

Miquel A. Ferrer, Ernest Valveny

引用次数: 2

A Two Stage Recognition Scheme for Handwritten Tamil Characters 手写泰米尔字符的两阶段识别方案

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.37

U. Bhattacharya, S. Ghosh, S. K. Parui

{"title":"A Two Stage Recognition Scheme for Handwritten Tamil Characters","authors":"U. Bhattacharya, S. Ghosh, S. K. Parui","doi":"10.1109/ICDAR.2007.37","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.37","url":null,"abstract":"India is a multilingual multiscript country with more than 18 languages and 10 different major scripts. Not enough research work towards recognition of handwritten characters of these Indian scripts has been done. Tamil, an official as well as popular script of the southern part of India, Singapore, Malaysia, and Sri Lanka has a large character set which includes many compound characters. Only a few works towards handwriting recognition of this large character set has been reported in the literature. Recently, HP Labs India developed a database of handwritten Tamil characters. In the present paper, we describe an off-line recognition approach based on this database. The proposed method consists of two stages. In the first stage, we apply an unsupervised clustering method to create a smaller number of groups of handwritten Tamil character classes. In the second stage, we consider a supervised classification technique in each of these smaller groups for final recognition. The features considered in the two stages are different. The proposed two-stage recognition scheme provided acceptable classification accuracies on both the training and test sets of the present database.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131686054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

A hybrid approach for off-line Arabic handwriting recognition based on a Planar Hidden Markov modeling 一种基于平面隐马尔可夫建模的离线阿拉伯手写识别混合方法

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.14

Sameh Masmoudi Touj, N. Amara, H. Amiri

引用次数: 31

Hidden Markov Models for Online Handwritten Tamil Word Recognition 隐马尔可夫模型用于在线手写泰米尔语单词识别

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.131

A. Bharath, S. Madhvanath

引用次数: 68

Pàtrà: A Novel Document Architecture for Integrating Handwriting with Audio-Visual Information Pàtrà:一种集成手写和视听信息的新型文档体系结构

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.204

Gaurav Harit, V. Mankar, S. Chaudhury

引用次数: 4

PRAAD: Preprocessing and Analysis Tool for Arabic Ancient Documents PRAAD:阿拉伯语古代文献预处理与分析工具

Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Pub Date : 2007-09-23 DOI: 10.1109/ICDAR.2007.209

Wafa Boussellaa, Abderrazak Zahour, B. Taconet, A. Alimi, A. BenAbdelhafid

引用次数: 9