基于变压器的发票文本识别

Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning Pub Date : 2023-03-17 DOI:10.1145/3590003.3590034

Yanlan Chen

{"title":"基于变压器的发票文本识别","authors":"Yanlan Chen","doi":"10.1145/3590003.3590034","DOIUrl":null,"url":null,"abstract":"A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TIRec: Transformer-based Invoice Text Recognition\",\"authors\":\"Yanlan Chen\",\"doi\":\"10.1145/3590003.3590034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.\",\"PeriodicalId\":340225,\"journal\":{\"name\":\"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3590003.3590034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3590003.3590034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

提出了一种新的发票文本识别模型。在过去的几年里，研究人员探索了使用类rnn结构来建模语义信息的文本识别方法。然而，基于rnn的方法存在一些明显的缺陷，如逐级解码方式和语义信息的单向串行传输，这极大地限制了语义信息的有效性和计算效率。相比之下，发票文本由于其固定的文本模式，具有明显的上下文关系，发票中的文本字体更加固定，背景的复杂性远低于自然场景。为了进一步利用这些上下文关系并适应发票文本的特征，我们提出了一个受Transformer[1]启发的新的文本识别框架。基于自关注的体系结构，特别是Transformer，在自然语言处理(NLP)中已经取得了成功。它展示了在自然语言处理中强大的语义信息建模能力。受其成功的启发，我们尝试将Transformer应用于发票文本识别。与基于rnn的方法不同，我们减少了用于提取图像特征的视觉网络的参数，使用卷积视觉变压器注意力模块捕获语义信息，并使用变压器解码模块并行解码所有字符。我们希望这种基于transformer的体系结构能够更好地为发票中的语义信息建模，同时保持轻量级。同时，我们收集了4万多张火车发票、增值税发票、滚动发票和出租车发票的文字图像。在已收集的发票文本识别数据集上的实验表明，我们的方法在准确率和速度上都优于以前的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TIRec: Transformer-based Invoice Text Recognition

A novel invoice text recognition model is proposed. In the past few years, researchers have explored text recognition methods with RNN-like structures to model semantic information. However, RNN-based approaches have some obvious drawbacks, such as the level-by-level decoding approach and the one-way serial transmission of semantic information, which greatly limit semantic information's effectiveness and computational efficiency. In contrast, invoice text has obvious contextual relationships due to its fixed text pattern, the text font in the invoice is more fixed and the complexity of the background is much lower than that of natural scenes. To further exploit these contextual relationships and adapt to the characteristics of invoice text, we propose a new text recognition framework inspired by Transformer [1]. Self-attention-based architectures, in particular Transformer, have been successful in natural language processing (NLP). It has demonstrated powerful semantic information modeling capabilities in NLP. Inspired by its success, we try to apply Transformer to invoice text recognition. Unlike the RNN-based approach, we reduce the parameters of the vision network used to extract image features, use the Convolutional Vision Transformer Attention module to capture the semantic information, and use the Transformer decoding module to decode all characters in parallel. We hope that this Transformer-based architecture can better model the semantic information in invoices while remaining lightweight. Meanwhile, we collected text images of more than 40,000 train invoices, VAT invoices, rolled invoices, and cab invoices. Experiments on the collected invoice text recognition dataset show that our approach outperforms previous methods in terms of accuracy and speed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning

自引率

0.00%

发文量