2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)最新文献

筛选
英文 中文
New Word Pair Level Embeddings to Improve Word Pair Similarity 新的词对级嵌入提高词对相似度
Nazar Khan, Asma Shaukat
{"title":"New Word Pair Level Embeddings to Improve Word Pair Similarity","authors":"Nazar Khan, Asma Shaukat","doi":"10.1109/ICDAR.2017.329","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.329","url":null,"abstract":"We present a novel approach for computing similarity of English word pairs. While many previous approaches compute cosine similarity of individually computed word embeddings, we compute a single embedding for the word pair that is suited for similarity computation. Such embeddings are then used to train a machine learning model. Testing results on MEN and WordSim-353 datasets demonstrate that for the task of word pair similarity, computing word pair embeddings is better than computing word embeddings only.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128603789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Shallow Neural Network Model for Hand-Drawn Symbol Recognition in Multi-Writer Scenario 多写作者手绘符号识别的浅神经网络模型
S. Dey, Anjan Dutta, J. Lladós, A. Fornés, U. Pal
{"title":"Shallow Neural Network Model for Hand-Drawn Symbol Recognition in Multi-Writer Scenario","authors":"S. Dey, Anjan Dutta, J. Lladós, A. Fornés, U. Pal","doi":"10.1109/ICDAR.2017.263","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.263","url":null,"abstract":"One of the main challenges in hand drawn symbol recognition is the variability among symbols because of the different writer styles. In this paper, we present and discuss some results recognizing hand-drawn symbols with a shallow neural network. A neural network model inspired from the LeNet architecture has been used to achieve state-of-the-art results with very less training data, which is very unlikely to the data hungry deep neural network. From the results, it has become evident that the neural network architectures can efficiently describe and recognize hand drawn symbols from different writers and can model the inter author aberration.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129003944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Symbol Dominance Based Formulae Recognition Approach for PDF Documents 基于符号优势的PDF文档公式识别方法
Xiaode Zhang, Liangcai Gao, Ke Yuan, Runtao Liu, Zhuoren Jiang, Zhi Tang
{"title":"A Symbol Dominance Based Formulae Recognition Approach for PDF Documents","authors":"Xiaode Zhang, Liangcai Gao, Ke Yuan, Runtao Liu, Zhuoren Jiang, Zhi Tang","doi":"10.1109/ICDAR.2017.189","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.189","url":null,"abstract":"With more and more scientific documents becoming available in PDF format, recognition of formulae in these PDF documents is of great significance. In this paper, we propose a symbol dominance based formulae recognition approach to recovering formulae structures by using the rich information extracted directly from PDF files. The hierarchical structure of formula is represented by relationship tree, and the tree is built recursively based on symbol dominance, which considers both the spatial layout of symbols and the typesetting conventions of mathematics. In addition, we propose a special character recognition method to identify the formula characters with multiple components or variable unicode. Repeatable and comparable experiments have been done over two large datasets, IM2LATEX-100K and PDFME-10K. Experimental results demonstrate that our method is more adaptive and practical for PDF documents compared with other two existing available formulae recognition systems, INFTY and WYGIWYS.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124561898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents Transkribus -一个历史文献转录、识别和检索的服务平台
Philip Kahle, S. Colutto, Günter Hackl, Günter Mühlberger
{"title":"Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents","authors":"Philip Kahle, S. Colutto, Günter Hackl, Günter Mühlberger","doi":"10.1109/ICDAR.2017.307","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.307","url":null,"abstract":"Transkribus is a comprehensive platform for the computer-aided transcription, recognition and retrieval of digitized historical documents. The main user interface is provided via an open-source desktop application that incorporates means to segment document images, to add a transcription and to tag entities within. The desktop application is able to connect to the platform's backend, which implements a document management system as well as several tools for document image analysis, such as layout analysis or automatic/handwritten text recognition (ATR/HTR). Access to documents, uploaded to the platform, may be granted to other users in order to collaborate on the transcription and to share results.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114108046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
A Compact CNN-DBLSTM Based Character Model for Online Handwritten Chinese Text Recognition 基于精简CNN-DBLSTM的在线手写体中文文本识别字符模型
Kai Chen, Lily Tian, Haisong Ding, Meng Cai, Lei Sun, Sen Liang, Qiang Huo
{"title":"A Compact CNN-DBLSTM Based Character Model for Online Handwritten Chinese Text Recognition","authors":"Kai Chen, Lily Tian, Haisong Ding, Meng Cai, Lei Sun, Sen Liang, Qiang Huo","doi":"10.1109/ICDAR.2017.177","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.177","url":null,"abstract":"Recently, character model based on integrated convolutional neural network (CNN) and deep bidirectional long short-term memory (DBLSTM) has been demonstrated to be effective for online handwritten Chinese text recognition (HCTR). However, the reported CNN-DBLSTM topologies are too complex to be practically useful. In this paper, we propose a compact CNN-DBLSTM which has small footprint and low computation cost yet be able to accommodate multiple receptive fields for CNN-based feature extraction. By using the training set of a popular benchmark database, namely CASIA-OLHWDB, we trained a compact CNN-DBLSTM by a connectionist temporal classification (CTC) criterion with a multi-step training strategy. Combined this character model with a character trigram language model, our online HCTR system with a WFSTbased decoder has achieved state-of-the-art performance on both CASIA and ICDAR-2013 Chinese handwriting recognition competition test sets.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114763753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A Multi-Label Neural Network Approach to Solving Connected CAPTCHAs 求解连接验证码的多标签神经网络方法
Ke Qing, Rong Zhang
{"title":"A Multi-Label Neural Network Approach to Solving Connected CAPTCHAs","authors":"Ke Qing, Rong Zhang","doi":"10.1109/ICDAR.2017.216","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.216","url":null,"abstract":"Text-based CAPTCHA as a security technology is used widely to distinguish human beings from computer programs. Compared with the classification of sub-image containing individual character, segmentation is the key to standard approaches to solving CAPTCHAs automatically. However, the effectiveness of the traditional approaches is limited when the characters in CAPTCHAs are connected and distorted. In this paper, we propose a novel approach to solving CAPTCHAs without segmentation via using a multi-label convolutional neural network. The design of the network refers to the procedure that humans recognize CAPTCHAs containing connected characters and learn the correlation between neighboring characters. Our approach archives high accuracy on various datasets of CAPTCHAs with sophisticated distortion and segmentation-resistance.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114868424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Noise-Resilient Super-Resolution Framework to Boost OCR Performance 提高OCR性能的抗噪超分辨率框架
Manoj Sharma, Anupama Ray, S. Chaudhury, Brejesh Lall
{"title":"A Noise-Resilient Super-Resolution Framework to Boost OCR Performance","authors":"Manoj Sharma, Anupama Ray, S. Chaudhury, Brejesh Lall","doi":"10.1109/ICDAR.2017.83","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.83","url":null,"abstract":"Recognizing text from noisy low-resolution (LR) images is extremely challenging and is an open problem for the computer vision community. Super-resolving a noisy LR text image results in noisy High Resolution (HR) text image, as super-resolution (SR) leads to spatial correlation in the noise, and further cannot be de-noised successfully. Traditional noise-resilient text image super-resolution methods utilize a denoising algorithm prior to text SR but denoising process leads to loss of some high frequency details, and the output HR image has missing information (texture details and edges). This paper proposes a noise-resilient SR framework for text images and recognizes the text using a deep BLSTM network trained on high resolution images. The proposed end-to-end deep learning based framework for noise-resilient text image SR simultaneously perform image denoising and super-resolution as well as preserves missing details. Stacked sparse denoising auto-encoder (SSDA) is learned for LR text image denoising, and our proposed coupled deep convolutional auto-encoder (CDCA) is learned for text image super-resolution. The pretrained weights for both these networks serve as initial weights to the end-to-end framework during finetuning, and the network is jointly optimized for both the tasks. We tested on several Indian Language datasets and the OCR performance of the noise-resilient super-resolved images is at par with the original HR images.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126358958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Segments Graph-Based Approach for Document Capture in a Smartphone Video Stream 智能手机视频流中基于片段图的文档捕获方法
Alexander Zhukovsky, D. Nikolaev, V. Arlazarov, V. V. Postnikov, D. Polevoy, N. Skoryukina, T. S. Chernov, J. Shemiakina, Arseniy Mukovozov, I. Konovalenko, M. Povolotsky
{"title":"Segments Graph-Based Approach for Document Capture in a Smartphone Video Stream","authors":"Alexander Zhukovsky, D. Nikolaev, V. Arlazarov, V. V. Postnikov, D. Polevoy, N. Skoryukina, T. S. Chernov, J. Shemiakina, Arseniy Mukovozov, I. Konovalenko, M. Povolotsky","doi":"10.1109/ICDAR.2017.63","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.63","url":null,"abstract":"The paper is devoted to the analysis of the problem of document boundaries detection in images and in a video stream. The paper proposes an algorithm for obtaining the position of the document, consisting of very reliable segments of a document boundaries extraction and a construction of an intersection graph that satisfies the projective model of the rectangle. An online algorithm for selecting and integrating possible document positions in a video stream based on the Kalman filter is proposed. The analysis of possible modifications of the algorithm and their effect on the final result are provided. Evaluation of the quality of the document at ICDAR'15 Smartphone Document Capture competition's dataset [1] showed a mean result of 95.5% in Jaccard index of projectively corrected document quadrangles and a 3rd place in the competition.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128046842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Exploiting State-of-the-Art Deep Learning Methods for Document Image Analysis 利用最先进的深度学习方法进行文档图像分析
Vinaychandran Pondenkandath, Mathias Seuret, R. Ingold, Muhammad Zeshan Afzal, M. Liwicki
{"title":"Exploiting State-of-the-Art Deep Learning Methods for Document Image Analysis","authors":"Vinaychandran Pondenkandath, Mathias Seuret, R. Ingold, Muhammad Zeshan Afzal, M. Liwicki","doi":"10.1109/ICDAR.2017.325","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.325","url":null,"abstract":"This paper provides details of our (partially award-winning) methods submitted to four competitions of ICDAR 2017. In particular, they are designed to (i) classify scripts, (ii) perform pixel-based labeling for layout analysis, (iii) identify writers, and (iv) recognize font size and types. The methods build on the current state-of-the-art in Deep Learning and have been adapted to the specific needs of the individual tasks. All methods are variants of Convolutional Neural Network (CNN) with specialized architectures, initialization, and other tricks which have been introduced in the field of deep learning within the last few years.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128053002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Handwriting Recognition with Multigrams 手写识别与多图
Wassim Swaileh, T. Paquet, Yann Soullard, Pierrick Tranouez
{"title":"Handwriting Recognition with Multigrams","authors":"Wassim Swaileh, T. Paquet, Yann Soullard, Pierrick Tranouez","doi":"10.1109/ICDAR.2017.31","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.31","url":null,"abstract":"We introduce a novel handwriting recognition approach based on sub-lexical units known as multigrams of characters, that are variable lengths characters sequences. A Hidden Semi Markov model is used to model the multigrams occurrences within the target language corpus. Decoding the training language corpus with this model provides an optimized multigram lexicon of reduced size with high coverage rate of OOV compared to the traditional word modeling approach. The handwriting recognition system is composed of two components: the optical model and the statistical n-grams of multigrams language model. The two models are combined together during the recognition process using a decoding technique based on Weighted Finite State Transducers (WFST). We experiment the approach on two Latin language datasets (the French RIMES and English IAM datasets) and we show that it outperforms words and character models language models for high Out Of Vocabulary (OOV) words rates, and that it performs similarly to these traditional models for low OOV rates, with the advantage of a reduced complexity.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128112079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信