2014 11th IAPR International Workshop on Document Analysis Systems最新文献

筛选
英文 中文
The Robustness of a New 3D CAPTCHA 新型3D验证码的鲁棒性
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.31
Qianru Ye, Youbin Chen, Bin B. Zhu
{"title":"The Robustness of a New 3D CAPTCHA","authors":"Qianru Ye, Youbin Chen, Bin B. Zhu","doi":"10.1109/DAS.2014.31","DOIUrl":"https://doi.org/10.1109/DAS.2014.31","url":null,"abstract":"CAPTCHA is a standard security technology to tell humans and computers and the most widely used method is text based scheme. As many text schemes have been broken, 3D CAPTCHAs have emerged as one of the latest one. In this paper, we study the robustness of 3D text-based CAPTCHA adopted by Ku6 which is a leading website providing videos in China and provide the first analysis of 3D hollow CAPTCHA. The security of this CAPTCHA scheme relies on a novel segmentation resistance mechanism, which combines Crowding Character Together (CCT) strategy and side surfaces which form the 3D visual effect of characters and lead to a promising usability even under strong overlapping between characters. However, by exploiting the unique features of the 3D characters in hollow font, i.e. parallel boundaries, the different stroke width of side faces and front faces and relationships between them, we propose a technique that segments connected characters apart and repairs some overlapped apart. The success segmentation rate is 70%. With minor changes, our attack program works well on its two variations, the segmentation rate is 75% and 85% respectively.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115158190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Over-Generative Finite State Transducer N-Gram for Out-of-Vocabulary Word Recognition 超生成有限状态换能器N-Gram用于词汇外词识别
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.24
Ronaldo O. Messina, Christopher Kermorvant
{"title":"Over-Generative Finite State Transducer N-Gram for Out-of-Vocabulary Word Recognition","authors":"Ronaldo O. Messina, Christopher Kermorvant","doi":"10.1109/DAS.2014.24","DOIUrl":"https://doi.org/10.1109/DAS.2014.24","url":null,"abstract":"Hybrid statistical grammars both at word and character levels can be used to perform open-vocabulary recognition. This is usually done by allowing the special symbol for unknown-word in the word-level grammar and dynamically replacing it by a (long) n-gramat character-level, as the full transducer does not fit in the memory of most current computers. We present a modification of a finite-state-transducer (fst) n-gram that enables the creation of a static transducer, i.e. when it is not possible to perform on-demand composition. By combining paths in the \"LG\" transducer (composition of lexicon and n-gram)making it over-generative with respect to the n-grams observed in the corpus, it is possible to reduce the number of actual occurrences of the character-level grammar, the resulting transducer fits the memory of practical machines. We evaluate this model for handwriting recognition using the RIMES and the IAM dabases. We study its effect on the vocabulary size and show that this model is competitive with state-of-the-art solutions.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125224234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Planting, Growing, and Pruning Trees: Connected Filters Applied to Document Image Analysis 种植,生长和修剪树木:连接过滤器应用于文档图像分析
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.36
G. Lazzara, T. Géraud, Roland Levillain
{"title":"Planting, Growing, and Pruning Trees: Connected Filters Applied to Document Image Analysis","authors":"G. Lazzara, T. Géraud, Roland Levillain","doi":"10.1109/DAS.2014.36","DOIUrl":"https://doi.org/10.1109/DAS.2014.36","url":null,"abstract":"Mathematical morphology, when used in the field of document image analysis and processing, is often limited to some classical yet basic tools. The domain however features a lesser-known class of powerful operators, called connected filters. These operators present an important property: they do not shift nor create contours. Most connected filters are linked to a tree-based representation of an image's contents, where nodes represent connected components while edges express an inclusion relation. By computing attributes for each node of the tree from the corresponding connected component, then selecting nodes according to an attribute-based criterion, one can either filter or recognize objects in an image. This strategy is very intuitive, efficient, easy to implement, and actually well-suited to processing images of magazines. Examples of applications include image simplification, smart binarization, and object identification.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117004171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
End-to-End Conversion of HTML Tables for Populating a Relational Database 用于填充关系数据库的HTML表的端到端转换
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.9
G. Nagy, S. Seth, D. Embley
{"title":"End-to-End Conversion of HTML Tables for Populating a Relational Database","authors":"G. Nagy, S. Seth, D. Embley","doi":"10.1109/DAS.2014.9","DOIUrl":"https://doi.org/10.1109/DAS.2014.9","url":null,"abstract":"Automating the conversion of human-readable HTML tables into machine-readable relational tables will enable end-user query processing of the millions of data tables found on the web. Theoretically sound and experimentally successful methods for index-based segmentation, extraction of category hierarchies, and construction of a canonical table suitable for direct input to a relational database are demonstrated on 200 heterogeneous web tables. The methods are scalable: the program generates the 198 Access compatible CSV files in ~0.1s per table (two tables could not be indexed).","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129762765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
CERMINE -- Automatic Extraction of Metadata and References from Scientific Literature 自动从科学文献中提取元数据和参考文献
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.63
Dominika Tkaczyk, P. Szostek, Piotr Jan Dendek, Mateusz Fedoryszak, Lukasz Bolikowski
{"title":"CERMINE -- Automatic Extraction of Metadata and References from Scientific Literature","authors":"Dominika Tkaczyk, P. Szostek, Piotr Jan Dendek, Mateusz Fedoryszak, Lukasz Bolikowski","doi":"10.1109/DAS.2014.63","DOIUrl":"https://doi.org/10.1109/DAS.2014.63","url":null,"abstract":"CERMINE is a comprehensive open source system for extracting metadata and parsed bibliographic references from scientific articles in born-digital form. The system is based on a modular workflow, whose architecture allows for single step training and evaluation, enables effortless modifications and replacements of individual components and simplifies further architecture expanding. The implementations of most steps are based on supervised and unsupervised machine-learning techniques, which simplifies the process of adjusting the system to new document layouts. The paper describes the overall workflow architecture, provides details about individual implementations and reports evaluation methodology and results. CERMINE service is available at http://cermine.ceon.pl.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124551161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Holistic Recognition of Online Handwritten Words Based on an Ensemble of SVM Classifiers 基于SVM分类器集成的在线手写体整体识别
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.67
Avinaba Srimany, Souvik Dutta, S. K. Parui, S. D. Chowdhury, U. Bhattacharya
{"title":"Holistic Recognition of Online Handwritten Words Based on an Ensemble of SVM Classifiers","authors":"Avinaba Srimany, Souvik Dutta, S. K. Parui, S. D. Chowdhury, U. Bhattacharya","doi":"10.1109/DAS.2014.67","DOIUrl":"https://doi.org/10.1109/DAS.2014.67","url":null,"abstract":"In this paper, we present our recent study of a data driven approach to combining multiple SVM classifiers with RBF kernels each being trained with a distinct feature vector. The SVM classifiers in our ensemble are ranked based on their increasing order of average performance on the validation sample sets. The outputs of the SVM classifiers are combined based on a weighted average strategy which uses the above ranks of the underlying SVMs to determine the respective weights. In the present study, we design four sets of different feature vectors representing online handwritten words. Simple concatenation of these feature vectors does not help much in improving the recognition accuracy compared to the best performing feature vector among the four. Thus, we train distinct SVM classifiers with different feature vectors and combine their outputs at the final stage. The proposed recognition strategy is implemented on a limited vocabulary recognition problem of unconstrained mixed cursive online handwritten Bangla words. It improves existing recognition accuracies on a moderately large database of similar word samples.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123687581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Multi-oriented Handwritten Annotations Extraction from Scanned Documents 从扫描文档中提取多方向手写注释
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.17
M. B. Jlaiel, R. Mullot, A. Alimi
{"title":"Multi-oriented Handwritten Annotations Extraction from Scanned Documents","authors":"M. B. Jlaiel, R. Mullot, A. Alimi","doi":"10.1109/DAS.2014.17","DOIUrl":"https://doi.org/10.1109/DAS.2014.17","url":null,"abstract":"In this paper, we present an integrated system able to localize multi-oriented handwritten annotations in scanned documents. Unlike previous single methods which limit colors or types of annotations to be extracted, the proposed method attempts to extract annotations by fusing three feature extraction techniques based on internal and external shape analysis. Our method consists of two processes: 1) a coarse segmentation process which divides the scanned document into text and non-text regions. 2) A fine segmentation process which consists of three steps: a feature extraction process, a classification process and a majority voting process which identifies the segmented regions as machine-printed or handwritten annotations. We find that our adaptive method outperform all individual methods. Experimental results on a set of 301 annotated scanned documents are reported.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121477450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Newspaper Article Extraction Using Hierarchical Fixed Point Model 基于层次不动点模型的报纸文章提取
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.42
Anukriti Bansal, S. Chaudhury, Sumantra Dutta Roy, J. B. Srivastava
{"title":"Newspaper Article Extraction Using Hierarchical Fixed Point Model","authors":"Anukriti Bansal, S. Chaudhury, Sumantra Dutta Roy, J. B. Srivastava","doi":"10.1109/DAS.2014.42","DOIUrl":"https://doi.org/10.1109/DAS.2014.42","url":null,"abstract":"This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125785246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Multiscale Stroke-Based Page Segmentation Approach 基于多尺度笔划的页面分割方法
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.68
Mehdi Felhi, S. Tabbone, Maria V. Ortiz Segovia
{"title":"Multiscale Stroke-Based Page Segmentation Approach","authors":"Mehdi Felhi, S. Tabbone, Maria V. Ortiz Segovia","doi":"10.1109/DAS.2014.68","DOIUrl":"https://doi.org/10.1109/DAS.2014.68","url":null,"abstract":"In this paper we present a new hybrid page segmentation approach based on connected component and region analysis. We first describe our stroke descriptor that detects text and line component candidates using the skeleton of the binarized document image. Then, an active contour model is applied to segment the rest of the image into photo and background regions. This classification is verified by studying the variation of each detected region. Finally, we cluster the text candidates using mean-shift analysis technique according to their corresponding sizes and we present our adaptive projection profile approach to gather separately horizontal and vertical text regions. The method is applied for segmenting realistic scanned document images (newspapers and magazines) that contain text, lines and photo regions. We evaluate the performances of our approach by comparing it to the existing methods that participated in ICDAR page segmentation competition.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130218562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
AreCAPTCHA: Outsourcing Arabic Text Digitization to Native Speakers AreCAPTCHA:外包阿拉伯语文本数字化母语人士
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.50
M. Bakry, M. Khamis, Slim Abdennadher
{"title":"AreCAPTCHA: Outsourcing Arabic Text Digitization to Native Speakers","authors":"M. Bakry, M. Khamis, Slim Abdennadher","doi":"10.1109/DAS.2014.50","DOIUrl":"https://doi.org/10.1109/DAS.2014.50","url":null,"abstract":"There has been a recent increasing demand to digitize Arabic books and documents, due to the fact that digital books do not lose quality over time, and can be easily sustained. Meanwhile, the number of Arabic-speaking Internet users is increasing. We propose AreCAPTCHA, a system that digitizes Arabic text by outsourcing it to native Arabic speakers, while offering protective measures to online web forms of Arabic websites. As users interact with AreCAPTCHA, we collect possible digitizations of words that were not recognized by OCR programs. We explain how the system works, the challenges we faced, and promising preliminary evaluation results.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132994578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信