International Journal on Document Analysis and Recognition最新文献_第2页

DocXclassifier: towards a robust and interpretable deep neural network for document image classification DocXclassifier：为文档图像分类开发鲁棒且可解释的深度神经网络

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-06-25 DOI: 10.1007/s10032-024-00483-w

Saifullah Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed

{"title":"DocXclassifier: towards a robust and interpretable deep neural network for document image classification","authors":"Saifullah Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed","doi":"10.1007/s10032-024-00483-w","DOIUrl":"https://doi.org/10.1007/s10032-024-00483-w","url":null,"abstract":"Model interpretability and robustness are becoming increasingly critical today for the safe and practical deployment of deep learning (DL) models in industrial settings. As DL-backed automated document processing systems become increasingly common in business workflows, there is a pressing need today to enhance interpretability and robustness for the task of document image classification, an integral component of such systems. Surprisingly, while much research has been devoted to improving the performance of deep models for this task, little attention has been given to their interpretability and robustness. In this paper, we aim to improve upon both aspects and introduce two inherently interpretable deep document classifiers, DocXClassifier and DocXClassifierFPN, both of which not only achieve significant performance improvements over existing approaches but also hold the capability to simultaneously generate feature importance maps while making their predictions. Our approach involves integrating a convolutional neural network (ConvNet) backbone with an attention mechanism to perform weighted aggregation of features based on their importance to the class, enabling the generation of interpretable importance maps. Additionally, we propose integrating Feature Pyramid Networks with the attention mechanism to significantly enhance the resolution of the interpretability maps, especially for pyramidal ConvNet architectures. Our approach attains state-of-the-art performance in image-based classification on two popular document datasets, RVL-CDIP and Tobacco3482, with top-1 classification accuracies of 94.19% and 95.71%, respectively. Additionally, it sets a new record for the highest image-based classification accuracy on Tobacco3482 without transfer learning from RVL-CDIP, at 90.29%. In addition, our proposed training strategy demonstrates superior robustness compared to existing approaches, significantly outperforming them on 19 out of 21 different types of novel data distortions, while achieving comparable results on the remaining two. By combining robustness with interpretability, DocXClassifier presents a promising step toward the practical deployment of DL models for document classification tasks.","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"140 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Handwritten stenography recognition and the LION dataset 手写速记识别和 LION 数据集

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-06-15 DOI: 10.1007/s10032-024-00479-6

Raphaela Heil, Malin Nauwerck

{"title":"Handwritten stenography recognition and the LION dataset","authors":"Raphaela Heil, Malin Nauwerck","doi":"10.1007/s10032-024-00479-6","DOIUrl":"https://doi.org/10.1007/s10032-024-00479-6","url":null,"abstract":"In this paper, we establish the first baseline for handwritten stenography recognition, using the novel LION dataset, and investigate the impact of including selected aspects of stenographic theory into the recognition process. We make the LION dataset publicly available with the aim of encouraging future research in handwritten stenography recognition. A state-of-the-art text recognition model is trained to establish a baseline. Stenographic domain knowledge is integrated by transforming the target sequences into representations which approximate diplomatic transcriptions, wherein each symbol in the script is represented by its own character in the transliteration, as opposed to corresponding combinations of characters from the Swedish alphabet. Four such encoding schemes are evaluated and results are further improved by integrating a pre-training scheme, based on synthetic data. The baseline model achieves an average test character error rate (CER) of 29.81% and a word error rate (WER) of 55.14%. Test error rates are reduced significantly (p< 0.01) by combining stenography-specific target sequence encodings with pre-training and fine-tuning, yielding CERs in the range of 24.5–26% and WERs of 44.8–48.2%. An analysis of selected recognition errors illustrates the challenges that the stenographic writing system poses to text recognition. This work establishes the first baseline for handwritten stenography recognition. Our proposed combination of integrating stenography-specific knowledge, in conjunction with pre-training and fine-tuning on synthetic data, yields considerable improvements. Together with our precursor study on the subject, this is the first work to apply modern handwritten text recognition to stenography. The dataset and our code are publicly available via Zenodo.","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"2015 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Compactnet: a lightweight convolutional neural network for one-shot online signature verification Compactnet：用于一次在线签名验证的轻量级卷积神经网络

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-05-27 DOI: 10.1007/s10032-024-00478-7

Napa Sae-Bae, Nida Chatwattanasiri, Somkait Udomhunsakul

引用次数: 0

Experimental study of rehearsal-based incremental classification of document streams 基于演练的文件流增量分类实验研究

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-05-11 DOI: 10.1007/s10032-024-00467-w

Usman Malik, Muriel Visani, Nicolas Sidere, Mickael Coustaty, Aurelie Joseph

{"title":"Experimental study of rehearsal-based incremental classification of document streams","authors":"Usman Malik, Muriel Visani, Nicolas Sidere, Mickael Coustaty, Aurelie Joseph","doi":"10.1007/s10032-024-00467-w","DOIUrl":"https://doi.org/10.1007/s10032-024-00467-w","url":null,"abstract":"This research work proposes a novel protocol for rehearsal-based incremental learning models for the classification of business document streams using deep learning and, in particular, transformer-based natural language processing techniques. When implementing a rehearsal-based incremental classification model, the questions raised most often for parameterizing the model relate to the number of instances from “old” classes (learned in previous training iterations) which need to be kept in memory and the optimal number of new classes to be learned at each iteration. In this paper, we propose an incremental learning protocol that involves training incremental models using a weight-sharing strategy between transformer model layers across incremental training iterations. We provide a thorough experimental study that enables us to determine optimal ranges for various parameters in the context of incremental classification of business document streams. We also study the effect of the order in which the classes are presented to the model for learning and the effects of class imbalance on the model’s performances. Our results reveal no significant difference in the performances of our incrementally trained model and its statically trained counterpart after all training iterations (especially when, in the presence of class imbalance, the most represented classes are learned first). In addition, our proposed approach shows an improvement of 1.55% and 3.66% over a baseline model on two business documents dataset. Based on this experimental study, we provide a list of recommendations for researchers and developers for training rehearsal-based incremental classification models for business document streams. Our protocol can be further re-used for other final applications.","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"67 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deformity removal from handwritten text documents using variable cycle GAN 利用可变周期 GAN 从手写文本文档中去除畸形

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-05-07 DOI: 10.1007/s10032-024-00466-x

Shivangi Nigam, Adarsh Prasad Behera, Shekhar Verma, P. Nagabhushan

{"title":"Deformity removal from handwritten text documents using variable cycle GAN","authors":"Shivangi Nigam, Adarsh Prasad Behera, Shekhar Verma, P. Nagabhushan","doi":"10.1007/s10032-024-00466-x","DOIUrl":"https://doi.org/10.1007/s10032-024-00466-x","url":null,"abstract":"Text recognition systems typically work well for printed documents but struggle with handwritten documents due to different writing styles, background complexities, added noise of image acquisition methods, and deformed text images such as strike-offs and underlines. These deformities change the structural information, making it difficult to restore the deformed images while maintaining the structural information and preserving the semantic dependencies of the local pixels. Current adversarial networks are unable to preserve the structural and semantic dependencies as they focus on individual pixel-to-pixel variation and encourage non-meaningful aspects of the images. To address this, we propose a Variable Cycle Generative Adversarial Network (VCGAN) that considers the perceptual quality of the images. By using a variable Content Loss (Top-k Variable Loss ((TV_{k})) ), VCGAN preserves the inter-dependence of spatially close pixels while removing the strike-off strokes. The similarity of the images is computed with (TV_{k}) considering the intensity variations that do not interfere with the semantic structures of the image. Our results show that VCGAN can remove most deformities with an elevated F1 score of (97.40 %) and outperforms current state-of-the-art algorithms with a character error rate of (7.64 %) and word accuracy of (81.53 %) when tested on the handwritten text recognition system","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"18 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated systems for diagnosis of dysgraphia in children: a survey and novel framework 儿童书写障碍自动诊断系统：调查与新框架

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-04-15 DOI: 10.1007/s10032-024-00464-z

Jayakanth Kunhoth, Somaya Al-Maadeed, Suchithra Kunhoth, Younes Akbari, Moutaz Saleh

{"title":"Automated systems for diagnosis of dysgraphia in children: a survey and novel framework","authors":"Jayakanth Kunhoth, Somaya Al-Maadeed, Suchithra Kunhoth, Younes Akbari, Moutaz Saleh","doi":"10.1007/s10032-024-00464-z","DOIUrl":"https://doi.org/10.1007/s10032-024-00464-z","url":null,"abstract":"Learning disabilities, which primarily interfere with basic learning skills such as reading, writing, and math, are known to affect around 10% of children in the world. The poor motor skills and motor coordination as part of the neurodevelopmental disorder can become a causative factor for the difficulty in learning to write (dysgraphia), hindering the academic track of an individual. The signs and symptoms of dysgraphia include but are not limited to irregular handwriting, improper handling of writing medium, slow or labored writing, unusual hand position, etc. The widely accepted assessment criterion for all types of learning disabilities including dysgraphia has traditionally relied on examinations conducted by medical expert. However, in recent years, artificial intelligence has been employed to develop diagnostic systems for learning disabilities, utilizing diverse modalities of data, including handwriting analysis. This work presents a review of the existing automated dysgraphia diagnosis systems for children in the literature. The main focus of the work is to review artificial intelligence-based systems for dysgraphia diagnosis in children. This work discusses the data collection method, important handwriting features, and machine learning algorithms employed in the literature for the diagnosis of dysgraphia. Apart from that, this article discusses some of the non-artificial intelligence-based automated systems. Furthermore, this article discusses the drawbacks of existing systems and proposes a novel framework for dysgraphia diagnosis and assistance evaluation.","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"111 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Children age group detection based on human–computer interaction and time series analysis 基于人机交互和时间序列分析的儿童年龄组检测

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-03-06 DOI: 10.1007/s10032-024-00462-1

Juan Carlos Ruiz-Garcia, Carlos Hojas, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Jaime Herreros-Rodriguez

{"title":"Children age group detection based on human–computer interaction and time series analysis","authors":"Juan Carlos Ruiz-Garcia, Carlos Hojas, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Jaime Herreros-Rodriguez","doi":"10.1007/s10032-024-00462-1","DOIUrl":"https://doi.org/10.1007/s10032-024-00462-1","url":null,"abstract":"This article proposes a novel children–computer interaction (CCI) approach for the task of age group detection. This approach focuses on the automatic analysis of the time series generated from the interaction of the children with mobile devices. In particular, we extract a set of 25 time series related to spatial, pressure, and kinematic information of the children interaction while colouring a tree through a pen stylus tablet, a specific test from the large-scale public ChildCIdb database. A complete analysis of the proposed approach is carried out using different time series selection techniques to choose the most discriminative ones for the age group detection task: (i) a statistical analysis and (ii) an automatic algorithm called sequential forward search (SFS). In addition, different classification algorithms such as dynamic time warping barycenter averaging (DBA) and hidden Markov models (HMM) are studied. Accuracy results over 85% are achieved, outperforming previous approaches in the literature and in more challenging age group conditions. Finally, the approach presented in this study can benefit many children-related applications, for example, towards an age-appropriate environment with the technology.","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"120 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140056600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An unsupervised automatic organization method for Professor Shirakawa’s hand-notated documents of oracle bone inscriptions 白川教授手记甲骨文文献的无监督自动整理方法

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-03-05 DOI: 10.1007/s10032-024-00463-0

Xuebin Yue, Ziming Wang, Ryuto Ishibashi, Hayata Kaneko, Lin Meng

{"title":"An unsupervised automatic organization method for Professor Shirakawa’s hand-notated documents of oracle bone inscriptions","authors":"Xuebin Yue, Ziming Wang, Ryuto Ishibashi, Hayata Kaneko, Lin Meng","doi":"10.1007/s10032-024-00463-0","DOIUrl":"https://doi.org/10.1007/s10032-024-00463-0","url":null,"abstract":"As one of the most influential Chinese cultural researchers in the second half of the twentieth-century, Professor Shirakawa is active in the research field of ancient Chinese characters. He has left behind many valuable research documents, especially his hand-notated oracle bone inscriptions (OBIs) documents. OBIs are one of the world’s oldest characters and were used in the Shang Dynasty about 3600 years ago for divination and recording events. The organization of OBIs is not only helpful in better understanding Prof. Shirakawa’s research and further study of OBIs in general and their importance in ancient Chinese history. This paper proposes an unsupervised automatic organization method to organize Prof. Shirakawa’s OBIs and construct a handwritten OBIs data set for neural network learning. First, a suite of noise reduction is proposed to remove strangely shaped noise to reduce the data loss of OBIs. Secondly, a novel segmentation method based on the supervised classification of OBIs regions is proposed to reduce adverse effects between characters for more accurate OBIs segmentation. Thirdly, a unique unsupervised clustering method is proposed to classify the segmented characters. Finally, all the same characters in the hand-notated OBIs documents are organized together. The evaluation results show that noise reduction has been proposed to remove noises with an accuracy of 97.85%, which contains number information and closed-loop-like edges in the dataset. In addition, the accuracy of supervised classification of OBIs regions based on our model achieves 85.50%, which is higher than eight state-of-the-art deep learning models, and a particular preprocessing method we proposed improves the classification accuracy by nearly 11.50%. The accuracy of OBIs clustering based on supervised classification achieves 74.91%. These results demonstrate the effectiveness of our proposed unsupervised automatic organization of Prof. Shirakawa’s hand-notated OBIs documents. The code and datasets are available at http://www.ihpc.se.ritsumei.ac.jp/obidataset.html.","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"101 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140046535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the improvement of handwritten text line recognition with octave convolutional recurrent neural networks 论八度卷积递归神经网络对手写文本行识别的改进

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-02-20 DOI: 10.1007/s10032-024-00460-3

Dayvid Castro, Cleber Zanchettin, Luís A. Nunes Amaral

{"title":"On the improvement of handwritten text line recognition with octave convolutional recurrent neural networks","authors":"Dayvid Castro, Cleber Zanchettin, Luís A. Nunes Amaral","doi":"10.1007/s10032-024-00460-3","DOIUrl":"https://doi.org/10.1007/s10032-024-00460-3","url":null,"abstract":"Off-line handwritten text recognition (HTR) poses a significant challenge due to the complexities of variable handwriting styles, background degradation, and unconstrained word sequences. This work tackles the handwritten text line recognition problem using octave convolutional recurrent neural networks (OctCRNN). Our approach requires no word segmentation, preprocessing, or explicit feature extraction and leverages octave convolutions to process multiscale features without increasing the number of learnable parameters. We investigate the OctCRNN under different settings, including an octave design that efficiently balances computational cost and recognition performance. We thoroughly investigate the OctCRNN under different settings by formulating an experimental pipeline with a visualization step to get intuitions about how the model works compared to a counterpart based on traditional convolutions. The system becomes complete by adding a language model to increase linguistic knowledge. Finally, we assess the performance of our solution using character and word error rates against established handwritten text recognition benchmarks: IAM, RIMES, and ICFHR 2016 READ. According to the results, our proposal achieves state-of-the-art performance while reducing the computational requirements. Our findings suggest that the architecture provides a robust framework for building HTR systems.","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"3 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139911073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Training transformer architectures on few annotated data: an application to historical handwritten text recognition 在少量注释数据上训练转换器架构：应用于历史手写文本识别

IF 2.3 4区计算机科学

International Journal on Document Analysis and Recognition Pub Date : 2024-01-25 DOI: 10.1007/s10032-023-00459-2

Killian Barrere, Yann Soullard, Aurélie Lemaitre, Bertrand Coüasnon

{"title":"Training transformer architectures on few annotated data: an application to historical handwritten text recognition","authors":"Killian Barrere, Yann Soullard, Aurélie Lemaitre, Bertrand Coüasnon","doi":"10.1007/s10032-023-00459-2","DOIUrl":"https://doi.org/10.1007/s10032-023-00459-2","url":null,"abstract":"Transformer-based architectures show excellent results on the task of handwritten text recognition, becoming the standard architecture for modern datasets. However, they require a significant amount of annotated data to achieve competitive results. They typically rely on synthetic data to solve this problem. Historical handwritten text recognition represents a challenging task due to degradations, specific handwritings for which few examples are available and ancient languages that vary over time. These limitations also make it difficult to generate realistic synthetic data. Given sufficient and appropriate data, Transformer-based architectures could alleviate these concerns, thanks to their ability to have a global view of textual images and their language modeling capabilities. In this paper, we propose the use of a lightweight Transformer model to tackle the task of historical handwritten text recognition. To train the architecture, we introduce realistic looking synthetic data reproducing the style of historical handwritings. We present a specific strategy, both for training and prediction, to deal with historical documents, where only a limited amount of training data are available. We evaluate our approach on the ICFHR 2018 READ dataset which is dedicated to handwriting recognition in specific historical documents. The results show that our Transformer-based approach is able to outperform existing methods.","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"26 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139580505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0