The Hip最新文献 - Book学术

Automatic photometric restoration of historical photographic negatives 历史照片底片的自动光度恢复

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501133

George V. Landon

引用次数: 4

Why multiple document image binarizations improve OCR 为什么多个文档图像二值化可以改善OCR

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501126

William B. Lund, Douglas J. Kennard, Eric K. Ringger

{"title":"Why multiple document image binarizations improve OCR","authors":"William B. Lund, Douglas J. Kennard, Eric K. Ringger","doi":"10.1145/2501115.2501126","DOIUrl":"https://doi.org/10.1145/2501115.2501126","url":null,"abstract":"Our previous work has shown that the error correction of optical character recognition (OCR) on degraded historical machine-printed documents is improved with the use of multiple information sources and multiple OCR hypotheses including from multiple document image binarizations. The contributions of this paper are in demonstrating how diversity among multiple binarizations makes those improvements to OCR accuracy possible. We demonstrate the degree and breadth to which the information required for correction is distributed across multiple binarizations of a given document image. Our analysis reveals that the sources of these corrections are not limited to any single binarization and that the full range of binarizations holds information needed to achieve the best result as measured by the word error rate (WER) of the final OCR decision. Even binarizations with high WERs contribute to improving the final OCR. For the corpus used in this research, fully 2.68% of all tokens are corrected using hypotheses not found in the OCR of the binarized image with the lowest WER. Further, we show that the higher the WER of the OCR overall, the more the corrections are distributed among all binarizations of the document image.","PeriodicalId":77938,"journal":{"name":"The Hip","volume":"15 1","pages":"86-93"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84375702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

The multi angular descriptor (MAD): a binary and gray images descriptor for shape recognition 多角度描述符(MAD):一种用于形状识别的二值和灰度图像描述符

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501128

Raid Saabni

引用次数: 3

An efficient parametrization of character degradation model for semi-synthetic image generation 半合成图像生成中特征退化模型的有效参数化

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501127

V. C. Kieu, M. Visani, N. Journet, R. Mullot, J. Domenger

{"title":"An efficient parametrization of character degradation model for semi-synthetic image generation","authors":"V. C. Kieu, M. Visani, N. Journet, R. Mullot, J. Domenger","doi":"10.1145/2501115.2501127","DOIUrl":"https://doi.org/10.1145/2501115.2501127","url":null,"abstract":"This paper presents an efficient parametrization method for generating synthetic noise on document images. By specifying the desired categories and amount of noise, the method is able to generate synthetic document images with most of degradations observed in real document images (ink splotches, white specks or streaks). Thanks to the ability of simulating different amount and kind of noise, it is possible to evaluate the robustness of many document image analysis methods. It also permits to generate data for algorithms that employ a learning process. The degradation model presented in [7] needs eight parameters for generating randomly noise regions. We propose here an extension of this model which aims to set automatically the eight parameters to generate precisely what a user wants (amount and category of noise). Our proposition consists of three steps. First, Nsp seed-points (i.e. centres of noise regions) are selected by an adaptive procedure. Then, these seed-points are classified into three categories of noise by using a heuristic rule. Finally, each size of noise region is set using a random process in order to generate degradations as realistic as possible.","PeriodicalId":77938,"journal":{"name":"The Hip","volume":"1 1","pages":"29-35"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89828063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Character segmentation and retrieval for learning support system of Japanese historical books 日本历史书学习支持系统的字符分割与检索

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501129

Chulapong Panichkriangkrai, Liang Li, K. Hachimura

引用次数: 17

Cost effective ontology population with data from lists in OCRed historical documents 具有成本效益的本体填充，数据来自OCRed历史文档中的列表

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501132

T. Packer, D. Embley

{"title":"Cost effective ontology population with data from lists in OCRed historical documents","authors":"T. Packer, D. Embley","doi":"10.1145/2501115.2501132","DOIUrl":"https://doi.org/10.1145/2501115.2501132","url":null,"abstract":"A method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would contribute to making a variety of historical knowledge machine searchable, queryable, and linkable. To work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its selection of human guidance. We propose ListReader, a wrapper-induction solution for information extraction that is specialized for lists in OCRed documents. ListReader can induce either a regular-expression grammar or a Hidden Markov Model. Each can infer list structure and field labels from OCR text. We decrease the cost and improve the accuracy of the induction process using semi-supervised machine learning and active learning, allowing induction of a wrapper from almost a single hand-labeled instance per field per list. After applying an induced wrapper, ListReader automatically maps the labeled text it produces to a rich variety of ontologically structured predicates. We evaluate our implementation on family history books in terms of the typical F-measure and a new metric, \"Label Efficiency\", which measures both extraction quality and cost in a single number. We show with statistical significance that ListReader reaches values closer to optimal levels than a state-of-the-art statistical sequence labeler.","PeriodicalId":77938,"journal":{"name":"The Hip","volume":"1 1","pages":"44-52"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90927628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A re-assembling scheme of fragmented Mokkan images 碎片化木观影像的重组方案

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501122

T. V. Phan, Hajime Baba, Akihiro Watanabe, M. Nakagawa

引用次数: 3

Nonrigid recto-verso registration using page outline structure and content preserving warps 使用页面轮廓结构和内容保留翘曲的非刚性直-反向配准

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501124

Róisín Rowley-Brooke, François Pitié, A. Kokaram

引用次数: 6

Texture feature evaluation for segmentation of historical document images 基于纹理特征评价的历史文档图像分割

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501121

Maroua Mehri, Petra Gomez-Krämer, P. Héroux, A. Boucher, R. Mullot

{"title":"Texture feature evaluation for segmentation of historical document images","authors":"Maroua Mehri, Petra Gomez-Krämer, P. Héroux, A. Boucher, R. Mullot","doi":"10.1145/2501115.2501121","DOIUrl":"https://doi.org/10.1145/2501115.2501121","url":null,"abstract":"Texture feature analysis has undergone tremendous growth in recent years. It plays an important role for the analysis of many kinds of images. More recently, the use of texture analysis techniques for historical document image segmentation has become a logical and relevant choice in the conditions of significant document image degradation and in the context of lacking information on the document structure such as the document model and the typographical parameters. However, previous work in the use of texture analysis for segmentation of digitized historical document images has been limited to separately test one of the well-known texture-based approaches such as autocorrelation function, Grey Level Co-occurrence Matrix (GLCM), Gabor filters, gradient, wavelets, etc. In this paper we raise the question of which texture-based method could be better suited for discriminating on the one hand graphical regions from textual ones and on the other hand for separating textual regions with different sizes and fonts. The objective of this paper is to compare some of the well-known texture-based approaches: autocorrelation function, GLCM, and Gabor filters, used in a segmentation of digitized historical document images. Texture features are briefly described and quantitative results are obtained on simplified historical document images. The achieved results are very encouraging.","PeriodicalId":77938,"journal":{"name":"The Hip","volume":"10 1","pages":"102-109"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86967903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Robust text and drawing segmentation algorithm for historical documents 历史文献的鲁棒文本和绘图分割算法

The Hip Pub Date : 2013-08-24 DOI: 10.1145/2501115.2501117

Rafi Cohen, Abedelkadir Asi, K. Kedem, Jihad El-Sana, I. Dinstein

引用次数: 50