{"title":"Handwritten stenography recognition and the LION dataset","authors":"Raphaela Heil, Malin Nauwerck","doi":"10.1007/s10032-024-00479-6","DOIUrl":"https://doi.org/10.1007/s10032-024-00479-6","url":null,"abstract":"<p>In this paper, we establish the first baseline for handwritten stenography recognition, using the novel LION dataset, and investigate the impact of including selected aspects of stenographic theory into the recognition process. We make the LION dataset publicly available with the aim of encouraging future research in handwritten stenography recognition. A state-of-the-art text recognition model is trained to establish a baseline. Stenographic domain knowledge is integrated by transforming the target sequences into representations which approximate diplomatic transcriptions, wherein each symbol in the script is represented by its own character in the transliteration, as opposed to corresponding combinations of characters from the Swedish alphabet. Four such encoding schemes are evaluated and results are further improved by integrating a pre-training scheme, based on synthetic data. The baseline model achieves an average test character error rate (CER) of 29.81% and a word error rate (WER) of 55.14%. Test error rates are reduced significantly (<i>p</i>< 0.01) by combining stenography-specific target sequence encodings with pre-training and fine-tuning, yielding CERs in the range of 24.5–26% and WERs of 44.8–48.2%. An analysis of selected recognition errors illustrates the challenges that the stenographic writing system poses to text recognition. This work establishes the first baseline for handwritten stenography recognition. Our proposed combination of integrating stenography-specific knowledge, in conjunction with pre-training and fine-tuning on synthetic data, yields considerable improvements. Together with our precursor study on the subject, this is the first work to apply modern handwritten text recognition to stenography. The dataset and our code are publicly available via Zenodo.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"2015 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compactnet: a lightweight convolutional neural network for one-shot online signature verification","authors":"Napa Sae-Bae, Nida Chatwattanasiri, Somkait Udomhunsakul","doi":"10.1007/s10032-024-00478-7","DOIUrl":"https://doi.org/10.1007/s10032-024-00478-7","url":null,"abstract":"<p>This paper proposes a method for the online signature verification task that allows the signature to be verified effectively using a single enrolled signature sample. The method utilizes a neural network with two one-dimensional convolutional neural network (1D-CNN) components to extract the vector representation of an online signature. The first component is a global 1D-CNN with full-length kernels. The second component is the standard 1D-CNN with partial length kernels that have been successfully used in many time-series classification tasks. The network is trained from a set of online signature samples to extract the vector representation of unknown signatures. The experimental results demonstrated that when using a vector representation derived from the proposed network, a single unseen enrolled signature sample achieved an Equal Error Rate (EER) of 4.35% when tested against authentic signatures of other users. This result indicates the effectiveness of the network in accurately distinguishing between genuine signatures and those of different users.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"33 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Usman Malik, Muriel Visani, Nicolas Sidere, Mickael Coustaty, Aurelie Joseph
{"title":"Experimental study of rehearsal-based incremental classification of document streams","authors":"Usman Malik, Muriel Visani, Nicolas Sidere, Mickael Coustaty, Aurelie Joseph","doi":"10.1007/s10032-024-00467-w","DOIUrl":"https://doi.org/10.1007/s10032-024-00467-w","url":null,"abstract":"<p>This research work proposes a novel protocol for rehearsal-based incremental learning models for the classification of business document streams using deep learning and, in particular, transformer-based natural language processing techniques. When implementing a rehearsal-based incremental classification model, the questions raised most often for parameterizing the model relate to the number of instances from “old” classes (learned in previous training iterations) which need to be kept in memory and the optimal number of new classes to be learned at each iteration. In this paper, we propose an incremental learning protocol that involves training incremental models using a weight-sharing strategy between transformer model layers across incremental training iterations. We provide a thorough experimental study that enables us to determine optimal ranges for various parameters in the context of incremental classification of business document streams. We also study the effect of the order in which the classes are presented to the model for learning and the effects of class imbalance on the model’s performances. Our results reveal no significant difference in the performances of our incrementally trained model and its statically trained counterpart after all training iterations (especially when, in the presence of class imbalance, the most represented classes are learned first). In addition, our proposed approach shows an improvement of 1.55% and 3.66% over a baseline model on two business documents dataset. Based on this experimental study, we provide a list of recommendations for researchers and developers for training rehearsal-based incremental classification models for business document streams. Our protocol can be further re-used for other final applications.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"67 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shivangi Nigam, Adarsh Prasad Behera, Shekhar Verma, P. Nagabhushan
{"title":"Deformity removal from handwritten text documents using variable cycle GAN","authors":"Shivangi Nigam, Adarsh Prasad Behera, Shekhar Verma, P. Nagabhushan","doi":"10.1007/s10032-024-00466-x","DOIUrl":"https://doi.org/10.1007/s10032-024-00466-x","url":null,"abstract":"<p>Text recognition systems typically work well for printed documents but struggle with handwritten documents due to different writing styles, background complexities, added noise of image acquisition methods, and deformed text images such as strike-offs and underlines. These deformities change the structural information, making it difficult to restore the deformed images while maintaining the structural information and preserving the semantic dependencies of the local pixels. Current adversarial networks are unable to preserve the structural and semantic dependencies as they focus on individual pixel-to-pixel variation and encourage non-meaningful aspects of the images. To address this, we propose a Variable Cycle Generative Adversarial Network (<i>VCGAN</i>) that considers the perceptual quality of the images. By using a variable Content Loss (Top-<i>k</i> Variable Loss (<span>(TV_{k})</span>) ), <i>VCGAN</i> preserves the inter-dependence of spatially close pixels while removing the strike-off strokes. The similarity of the images is computed with <span>(TV_{k})</span> considering the intensity variations that do not interfere with the semantic structures of the image. Our results show that <i>VCGAN</i> can remove most deformities with an elevated <i>F</i>1 score of <span>(97.40 %)</span> and outperforms current state-of-the-art algorithms with a character error rate of <span>(7.64 %)</span> and word accuracy of <span>(81.53 %)</span> when tested on the handwritten text recognition system</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"18 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated systems for diagnosis of dysgraphia in children: a survey and novel framework","authors":"Jayakanth Kunhoth, Somaya Al-Maadeed, Suchithra Kunhoth, Younes Akbari, Moutaz Saleh","doi":"10.1007/s10032-024-00464-z","DOIUrl":"https://doi.org/10.1007/s10032-024-00464-z","url":null,"abstract":"<p>Learning disabilities, which primarily interfere with basic learning skills such as reading, writing, and math, are known to affect around 10% of children in the world. The poor motor skills and motor coordination as part of the neurodevelopmental disorder can become a causative factor for the difficulty in learning to write (dysgraphia), hindering the academic track of an individual. The signs and symptoms of dysgraphia include but are not limited to irregular handwriting, improper handling of writing medium, slow or labored writing, unusual hand position, etc. The widely accepted assessment criterion for all types of learning disabilities including dysgraphia has traditionally relied on examinations conducted by medical expert. However, in recent years, artificial intelligence has been employed to develop diagnostic systems for learning disabilities, utilizing diverse modalities of data, including handwriting analysis. This work presents a review of the existing automated dysgraphia diagnosis systems for children in the literature. The main focus of the work is to review artificial intelligence-based systems for dysgraphia diagnosis in children. This work discusses the data collection method, important handwriting features, and machine learning algorithms employed in the literature for the diagnosis of dysgraphia. Apart from that, this article discusses some of the non-artificial intelligence-based automated systems. Furthermore, this article discusses the drawbacks of existing systems and proposes a novel framework for dysgraphia diagnosis and assistance evaluation.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"111 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Carlos Ruiz-Garcia, Carlos Hojas, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Jaime Herreros-Rodriguez
{"title":"Children age group detection based on human–computer interaction and time series analysis","authors":"Juan Carlos Ruiz-Garcia, Carlos Hojas, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Jaime Herreros-Rodriguez","doi":"10.1007/s10032-024-00462-1","DOIUrl":"https://doi.org/10.1007/s10032-024-00462-1","url":null,"abstract":"<p>This article proposes a novel children–computer interaction (CCI) approach for the task of age group detection. This approach focuses on the automatic analysis of the time series generated from the interaction of the children with mobile devices. In particular, we extract a set of 25 time series related to spatial, pressure, and kinematic information of the children interaction while colouring a tree through a pen stylus tablet, a specific test from the large-scale public ChildCIdb database. A complete analysis of the proposed approach is carried out using different time series selection techniques to choose the most discriminative ones for the age group detection task: (i) a statistical analysis and (ii) an automatic algorithm called sequential forward search (SFS). In addition, different classification algorithms such as dynamic time warping barycenter averaging (DBA) and hidden Markov models (HMM) are studied. Accuracy results over 85% are achieved, outperforming previous approaches in the literature and in more challenging age group conditions. Finally, the approach presented in this study can benefit many children-related applications, for example, towards an age-appropriate environment with the technology.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"120 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140056600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuebin Yue, Ziming Wang, Ryuto Ishibashi, Hayata Kaneko, Lin Meng
{"title":"An unsupervised automatic organization method for Professor Shirakawa’s hand-notated documents of oracle bone inscriptions","authors":"Xuebin Yue, Ziming Wang, Ryuto Ishibashi, Hayata Kaneko, Lin Meng","doi":"10.1007/s10032-024-00463-0","DOIUrl":"https://doi.org/10.1007/s10032-024-00463-0","url":null,"abstract":"<p>As one of the most influential Chinese cultural researchers in the second half of the twentieth-century, Professor Shirakawa is active in the research field of ancient Chinese characters. He has left behind many valuable research documents, especially his hand-notated oracle bone inscriptions (OBIs) documents. OBIs are one of the world’s oldest characters and were used in the Shang Dynasty about 3600 years ago for divination and recording events. The organization of OBIs is not only helpful in better understanding Prof. Shirakawa’s research and further study of OBIs in general and their importance in ancient Chinese history. This paper proposes an unsupervised automatic organization method to organize Prof. Shirakawa’s OBIs and construct a handwritten OBIs data set for neural network learning. First, a suite of noise reduction is proposed to remove strangely shaped noise to reduce the data loss of OBIs. Secondly, a novel segmentation method based on the supervised classification of OBIs regions is proposed to reduce adverse effects between characters for more accurate OBIs segmentation. Thirdly, a unique unsupervised clustering method is proposed to classify the segmented characters. Finally, all the same characters in the hand-notated OBIs documents are organized together. The evaluation results show that noise reduction has been proposed to remove noises with an accuracy of 97.85%, which contains number information and closed-loop-like edges in the dataset. In addition, the accuracy of supervised classification of OBIs regions based on our model achieves 85.50%, which is higher than eight state-of-the-art deep learning models, and a particular preprocessing method we proposed improves the classification accuracy by nearly 11.50%. The accuracy of OBIs clustering based on supervised classification achieves 74.91%. These results demonstrate the effectiveness of our proposed unsupervised automatic organization of Prof. Shirakawa’s hand-notated OBIs documents. The code and datasets are available at http://www.ihpc.se.ritsumei.ac.jp/obidataset.html.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"101 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140046535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dayvid Castro, Cleber Zanchettin, Luís A. Nunes Amaral
{"title":"On the improvement of handwritten text line recognition with octave convolutional recurrent neural networks","authors":"Dayvid Castro, Cleber Zanchettin, Luís A. Nunes Amaral","doi":"10.1007/s10032-024-00460-3","DOIUrl":"https://doi.org/10.1007/s10032-024-00460-3","url":null,"abstract":"<p>Off-line handwritten text recognition (HTR) poses a significant challenge due to the complexities of variable handwriting styles, background degradation, and unconstrained word sequences. This work tackles the handwritten text line recognition problem using octave convolutional recurrent neural networks (OctCRNN). Our approach requires no word segmentation, preprocessing, or explicit feature extraction and leverages octave convolutions to process multiscale features without increasing the number of learnable parameters. We investigate the OctCRNN under different settings, including an octave design that efficiently balances computational cost and recognition performance. We thoroughly investigate the OctCRNN under different settings by formulating an experimental pipeline with a visualization step to get intuitions about how the model works compared to a counterpart based on traditional convolutions. The system becomes complete by adding a language model to increase linguistic knowledge. Finally, we assess the performance of our solution using character and word error rates against established handwritten text recognition benchmarks: IAM, RIMES, and ICFHR 2016 READ. According to the results, our proposal achieves state-of-the-art performance while reducing the computational requirements. Our findings suggest that the architecture provides a robust framework for building HTR systems.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"3 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139911073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Training transformer architectures on few annotated data: an application to historical handwritten text recognition","authors":"Killian Barrere, Yann Soullard, Aurélie Lemaitre, Bertrand Coüasnon","doi":"10.1007/s10032-023-00459-2","DOIUrl":"https://doi.org/10.1007/s10032-023-00459-2","url":null,"abstract":"<p>Transformer-based architectures show excellent results on the task of handwritten text recognition, becoming the standard architecture for modern datasets. However, they require a significant amount of annotated data to achieve competitive results. They typically rely on synthetic data to solve this problem. Historical handwritten text recognition represents a challenging task due to degradations, specific handwritings for which few examples are available and ancient languages that vary over time. These limitations also make it difficult to generate realistic synthetic data. Given sufficient and appropriate data, Transformer-based architectures could alleviate these concerns, thanks to their ability to have a global view of textual images and their language modeling capabilities. In this paper, we propose the use of a lightweight Transformer model to tackle the task of historical handwritten text recognition. To train the architecture, we introduce realistic looking synthetic data reproducing the style of historical handwritings. We present a specific strategy, both for training and prediction, to deal with historical documents, where only a limited amount of training data are available. We evaluate our approach on the ICFHR 2018 READ dataset which is dedicated to handwriting recognition in specific historical documents. The results show that our Transformer-based approach is able to outperform existing methods.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"26 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139580505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Background grid extraction from historical hand-drawn cadastral maps","authors":"Tauseef Iftikhar, Nazar Khan","doi":"10.1007/s10032-023-00457-4","DOIUrl":"https://doi.org/10.1007/s10032-023-00457-4","url":null,"abstract":"<p>We tackle a novel problem of detecting background grids in hand-drawn cadastral maps. Grid extraction is necessary for accessing and contextualizing the actual map content. The problem is challenging since the background grid is the bottommost map layer that is severely occluded by subsequent map layers. We present a novel automatic method for robust, bottom-up extraction of background grid structures in historical cadastral maps. The proposed algorithm extracts grid structures under significant occlusion, missing information, and noise by iteratively providing an increasingly refined estimate of the grid structure. The key idea is to exploit periodicity of background grid lines to corroborate the existence of each other. We also present an automatic scheme for determining the ‘gridness’ of any detected grid so that the proposed method self-evaluates its result as being good or poor without using ground truth. We present empirical evidence to show that the proposed gridness measure is a good indicator of quality. On a dataset of 268 historical cadastral maps with resolution <span>(1424times 2136)</span> pixels, the proposed method detects grids in 247 images yielding an average root-mean-square error (RMSE) of 5.0 pixels and average intersection over union (IoU) of 0.990. On grids self-evaluated as being good, we report average RMSE of 4.39 pixels and average IoU of 0.991. To compare with the proposed bottom-up approach, we also develop three increasingly sophisticated top-down algorithms based on RANSAC-based model fitting. Experimental results show that our bottom-up algorithm yields better results than the top-down algorithms. We also demonstrate that using detected background grids for stitching different maps is visually better than both manual and SURF-based stitching.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"21 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138556858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}