发票文件分析中的词法标注方法

Y. Belaïd, A. Belaïd
{"title":"发票文件分析中的词法标注方法","authors":"Y. Belaïd, A. Belaïd","doi":"10.1109/ICPR.2004.1334166","DOIUrl":null,"url":null,"abstract":"In this paper a morphological tagging approach for document image invoice analysis is described. Tokens close by their morphology and confirmed in their location within different similar contexts make apparent some parts of speech representative of the structure elements. This bottom up approach avoids the use of an priori knowledge provided that there are redundant and frequent contexts in the text. The approach is applied on the invoice body text roughly recognized by OCR and automatically segmented. The method makes possible the detection of the invoice articles and their different fields. The regularity of the article composition and its redundancy in the invoice is a good help for its structure. The recognition rate of 276 invoices and 1704 articles, is over than 91.02% for articles and 92.56% for fields.","PeriodicalId":74516,"journal":{"name":"Proceedings of the ... IAPR International Conference on Pattern Recognition. International Conference on Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Morphological Tagging Approach in Document Analysis of Invoices\",\"authors\":\"Y. Belaïd, A. Belaïd\",\"doi\":\"10.1109/ICPR.2004.1334166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper a morphological tagging approach for document image invoice analysis is described. Tokens close by their morphology and confirmed in their location within different similar contexts make apparent some parts of speech representative of the structure elements. This bottom up approach avoids the use of an priori knowledge provided that there are redundant and frequent contexts in the text. The approach is applied on the invoice body text roughly recognized by OCR and automatically segmented. The method makes possible the detection of the invoice articles and their different fields. The regularity of the article composition and its redundancy in the invoice is a good help for its structure. The recognition rate of 276 invoices and 1704 articles, is over than 91.02% for articles and 92.56% for fields.\",\"PeriodicalId\":74516,\"journal\":{\"name\":\"Proceedings of the ... IAPR International Conference on Pattern Recognition. International Conference on Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... IAPR International Conference on Pattern Recognition. International Conference on Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPR.2004.1334166\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... IAPR International Conference on Pattern Recognition. International Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR.2004.1334166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

本文描述了一种用于文档图像发票分析的形态学标记方法。在不同的相似语境中,接近其形态并在其位置上得到确认的标记使某些具有结构要素代表性的言语部分变得明显。这种自下而上的方法避免了在文本中存在冗余和频繁上下文的情况下使用先验知识。将该方法应用于OCR粗略识别的发票正文文本,并对其进行自动分割。该方法可实现发票物品及其不同字段的检测。发票中物品组成的规律性及其冗余性对其结构有很好的帮助。276张发票,1704篇文章,文章识别率超过91.02%,领域识别率超过92.56%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Morphological Tagging Approach in Document Analysis of Invoices
In this paper a morphological tagging approach for document image invoice analysis is described. Tokens close by their morphology and confirmed in their location within different similar contexts make apparent some parts of speech representative of the structure elements. This bottom up approach avoids the use of an priori knowledge provided that there are redundant and frequent contexts in the text. The approach is applied on the invoice body text roughly recognized by OCR and automatically segmented. The method makes possible the detection of the invoice articles and their different fields. The regularity of the article composition and its redundancy in the invoice is a good help for its structure. The recognition rate of 276 invoices and 1704 articles, is over than 91.02% for articles and 92.56% for fields.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信