{"title":"New Word Pair Level Embeddings to Improve Word Pair Similarity","authors":"Nazar Khan, Asma Shaukat","doi":"10.1109/ICDAR.2017.329","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.329","url":null,"abstract":"We present a novel approach for computing similarity of English word pairs. While many previous approaches compute cosine similarity of individually computed word embeddings, we compute a single embedding for the word pair that is suited for similarity computation. Such embeddings are then used to train a machine learning model. Testing results on MEN and WordSim-353 datasets demonstrate that for the task of word pair similarity, computing word pair embeddings is better than computing word embeddings only.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128603789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shallow Neural Network Model for Hand-Drawn Symbol Recognition in Multi-Writer Scenario","authors":"S. Dey, Anjan Dutta, J. Lladós, A. Fornés, U. Pal","doi":"10.1109/ICDAR.2017.263","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.263","url":null,"abstract":"One of the main challenges in hand drawn symbol recognition is the variability among symbols because of the different writer styles. In this paper, we present and discuss some results recognizing hand-drawn symbols with a shallow neural network. A neural network model inspired from the LeNet architecture has been used to achieve state-of-the-art results with very less training data, which is very unlikely to the data hungry deep neural network. From the results, it has become evident that the neural network architectures can efficiently describe and recognize hand drawn symbols from different writers and can model the inter author aberration.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129003944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Symbol Dominance Based Formulae Recognition Approach for PDF Documents","authors":"Xiaode Zhang, Liangcai Gao, Ke Yuan, Runtao Liu, Zhuoren Jiang, Zhi Tang","doi":"10.1109/ICDAR.2017.189","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.189","url":null,"abstract":"With more and more scientific documents becoming available in PDF format, recognition of formulae in these PDF documents is of great significance. In this paper, we propose a symbol dominance based formulae recognition approach to recovering formulae structures by using the rich information extracted directly from PDF files. The hierarchical structure of formula is represented by relationship tree, and the tree is built recursively based on symbol dominance, which considers both the spatial layout of symbols and the typesetting conventions of mathematics. In addition, we propose a special character recognition method to identify the formula characters with multiple components or variable unicode. Repeatable and comparable experiments have been done over two large datasets, IM2LATEX-100K and PDFME-10K. Experimental results demonstrate that our method is more adaptive and practical for PDF documents compared with other two existing available formulae recognition systems, INFTY and WYGIWYS.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124561898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philip Kahle, S. Colutto, Günter Hackl, Günter Mühlberger
{"title":"Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents","authors":"Philip Kahle, S. Colutto, Günter Hackl, Günter Mühlberger","doi":"10.1109/ICDAR.2017.307","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.307","url":null,"abstract":"Transkribus is a comprehensive platform for the computer-aided transcription, recognition and retrieval of digitized historical documents. The main user interface is provided via an open-source desktop application that incorporates means to segment document images, to add a transcription and to tag entities within. The desktop application is able to connect to the platform's backend, which implements a document management system as well as several tools for document image analysis, such as layout analysis or automatic/handwritten text recognition (ATR/HTR). Access to documents, uploaded to the platform, may be granted to other users in order to collaborate on the transcription and to share results.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114108046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Chen, Lily Tian, Haisong Ding, Meng Cai, Lei Sun, Sen Liang, Qiang Huo
{"title":"A Compact CNN-DBLSTM Based Character Model for Online Handwritten Chinese Text Recognition","authors":"Kai Chen, Lily Tian, Haisong Ding, Meng Cai, Lei Sun, Sen Liang, Qiang Huo","doi":"10.1109/ICDAR.2017.177","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.177","url":null,"abstract":"Recently, character model based on integrated convolutional neural network (CNN) and deep bidirectional long short-term memory (DBLSTM) has been demonstrated to be effective for online handwritten Chinese text recognition (HCTR). However, the reported CNN-DBLSTM topologies are too complex to be practically useful. In this paper, we propose a compact CNN-DBLSTM which has small footprint and low computation cost yet be able to accommodate multiple receptive fields for CNN-based feature extraction. By using the training set of a popular benchmark database, namely CASIA-OLHWDB, we trained a compact CNN-DBLSTM by a connectionist temporal classification (CTC) criterion with a multi-step training strategy. Combined this character model with a character trigram language model, our online HCTR system with a WFSTbased decoder has achieved state-of-the-art performance on both CASIA and ICDAR-2013 Chinese handwriting recognition competition test sets.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114763753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multi-Label Neural Network Approach to Solving Connected CAPTCHAs","authors":"Ke Qing, Rong Zhang","doi":"10.1109/ICDAR.2017.216","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.216","url":null,"abstract":"Text-based CAPTCHA as a security technology is used widely to distinguish human beings from computer programs. Compared with the classification of sub-image containing individual character, segmentation is the key to standard approaches to solving CAPTCHAs automatically. However, the effectiveness of the traditional approaches is limited when the characters in CAPTCHAs are connected and distorted. In this paper, we propose a novel approach to solving CAPTCHAs without segmentation via using a multi-label convolutional neural network. The design of the network refers to the procedure that humans recognize CAPTCHAs containing connected characters and learn the correlation between neighboring characters. Our approach archives high accuracy on various datasets of CAPTCHAs with sophisticated distortion and segmentation-resistance.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114868424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manoj Sharma, Anupama Ray, S. Chaudhury, Brejesh Lall
{"title":"A Noise-Resilient Super-Resolution Framework to Boost OCR Performance","authors":"Manoj Sharma, Anupama Ray, S. Chaudhury, Brejesh Lall","doi":"10.1109/ICDAR.2017.83","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.83","url":null,"abstract":"Recognizing text from noisy low-resolution (LR) images is extremely challenging and is an open problem for the computer vision community. Super-resolving a noisy LR text image results in noisy High Resolution (HR) text image, as super-resolution (SR) leads to spatial correlation in the noise, and further cannot be de-noised successfully. Traditional noise-resilient text image super-resolution methods utilize a denoising algorithm prior to text SR but denoising process leads to loss of some high frequency details, and the output HR image has missing information (texture details and edges). This paper proposes a noise-resilient SR framework for text images and recognizes the text using a deep BLSTM network trained on high resolution images. The proposed end-to-end deep learning based framework for noise-resilient text image SR simultaneously perform image denoising and super-resolution as well as preserves missing details. Stacked sparse denoising auto-encoder (SSDA) is learned for LR text image denoising, and our proposed coupled deep convolutional auto-encoder (CDCA) is learned for text image super-resolution. The pretrained weights for both these networks serve as initial weights to the end-to-end framework during finetuning, and the network is jointly optimized for both the tasks. We tested on several Indian Language datasets and the OCR performance of the noise-resilient super-resolved images is at par with the original HR images.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126358958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Zhukovsky, D. Nikolaev, V. Arlazarov, V. V. Postnikov, D. Polevoy, N. Skoryukina, T. S. Chernov, J. Shemiakina, Arseniy Mukovozov, I. Konovalenko, M. Povolotsky
{"title":"Segments Graph-Based Approach for Document Capture in a Smartphone Video Stream","authors":"Alexander Zhukovsky, D. Nikolaev, V. Arlazarov, V. V. Postnikov, D. Polevoy, N. Skoryukina, T. S. Chernov, J. Shemiakina, Arseniy Mukovozov, I. Konovalenko, M. Povolotsky","doi":"10.1109/ICDAR.2017.63","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.63","url":null,"abstract":"The paper is devoted to the analysis of the problem of document boundaries detection in images and in a video stream. The paper proposes an algorithm for obtaining the position of the document, consisting of very reliable segments of a document boundaries extraction and a construction of an intersection graph that satisfies the projective model of the rectangle. An online algorithm for selecting and integrating possible document positions in a video stream based on the Kalman filter is proposed. The analysis of possible modifications of the algorithm and their effect on the final result are provided. Evaluation of the quality of the document at ICDAR'15 Smartphone Document Capture competition's dataset [1] showed a mean result of 95.5% in Jaccard index of projectively corrected document quadrangles and a 3rd place in the competition.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128046842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vinaychandran Pondenkandath, Mathias Seuret, R. Ingold, Muhammad Zeshan Afzal, M. Liwicki
{"title":"Exploiting State-of-the-Art Deep Learning Methods for Document Image Analysis","authors":"Vinaychandran Pondenkandath, Mathias Seuret, R. Ingold, Muhammad Zeshan Afzal, M. Liwicki","doi":"10.1109/ICDAR.2017.325","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.325","url":null,"abstract":"This paper provides details of our (partially award-winning) methods submitted to four competitions of ICDAR 2017. In particular, they are designed to (i) classify scripts, (ii) perform pixel-based labeling for layout analysis, (iii) identify writers, and (iv) recognize font size and types. The methods build on the current state-of-the-art in Deep Learning and have been adapted to the specific needs of the individual tasks. All methods are variants of Convolutional Neural Network (CNN) with specialized architectures, initialization, and other tricks which have been introduced in the field of deep learning within the last few years.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128053002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wassim Swaileh, T. Paquet, Yann Soullard, Pierrick Tranouez
{"title":"Handwriting Recognition with Multigrams","authors":"Wassim Swaileh, T. Paquet, Yann Soullard, Pierrick Tranouez","doi":"10.1109/ICDAR.2017.31","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.31","url":null,"abstract":"We introduce a novel handwriting recognition approach based on sub-lexical units known as multigrams of characters, that are variable lengths characters sequences. A Hidden Semi Markov model is used to model the multigrams occurrences within the target language corpus. Decoding the training language corpus with this model provides an optimized multigram lexicon of reduced size with high coverage rate of OOV compared to the traditional word modeling approach. The handwriting recognition system is composed of two components: the optical model and the statistical n-grams of multigrams language model. The two models are combined together during the recognition process using a decoding technique based on Weighted Finite State Transducers (WFST). We experiment the approach on two Latin language datasets (the French RIMES and English IAM datasets) and we show that it outperforms words and character models language models for high Out Of Vocabulary (OOV) words rates, and that it performs similarly to these traditional models for low OOV rates, with the advantage of a reduced complexity.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128112079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}