从图像中提取结构：基于图的文档图像解释模型

Q4 Computer Science

Electronic Letters on Computer Vision and Image Analysis Pub Date : 2021-01-12 DOI:10.5565/REV/ELCVIA.1313

Pau Riba

{"title":"从图像中提取结构：基于图的文档图像解释模型","authors":"Pau Riba","doi":"10.5565/REV/ELCVIA.1313","DOIUrl":null,"url":null,"abstract":"From its early stages, the community of Pattern Recognition and Computer Vision has considered the importance of leveraging the structural information when understanding images. Usually, graphs have been proposed as a suitable model to represent this kind of information due to their flexibility and representational power able to codify both, the components, objects, or entities and their pairwise relationship. Even though graphs have been successfully applied to a huge variety of tasks, as a result of their symbolic and relational nature, graphs have always suffered from some limitations compared to statistical approaches. Indeed, some trivial mathematical operations do not have an equivalence in the graph domain. For instance, in the core of many pattern recognition applications, there is a need to compare two objects. This operation, which is trivial when considering feature vectors defined in ℝn, is not properly defined for graphs. In this thesis, we have investigated the importance of the structural information from two perspectives, the traditional graph-based methods and the new advances on Geometric Deep Learning. On the one hand, we explore the problem of defining a graph representation and how to deal with it on a large scale and noisy scenario. On the other hand, Graph Neural Networks are proposed to first redefine a Graph Edit Distance methodologies as a metric learning problem, and second, to apply them in a real use case scenario for the detection of repetitive patterns which define tables in invoice documents. As experimental framework, we have validated the different methodological contributions in the domain of Document Image Analysis and Recognition.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distilling Structure from Imagery: Graph-based Models for the Interpretation of Document Images\",\"authors\":\"Pau Riba\",\"doi\":\"10.5565/REV/ELCVIA.1313\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"From its early stages, the community of Pattern Recognition and Computer Vision has considered the importance of leveraging the structural information when understanding images. Usually, graphs have been proposed as a suitable model to represent this kind of information due to their flexibility and representational power able to codify both, the components, objects, or entities and their pairwise relationship. Even though graphs have been successfully applied to a huge variety of tasks, as a result of their symbolic and relational nature, graphs have always suffered from some limitations compared to statistical approaches. Indeed, some trivial mathematical operations do not have an equivalence in the graph domain. For instance, in the core of many pattern recognition applications, there is a need to compare two objects. This operation, which is trivial when considering feature vectors defined in ℝn, is not properly defined for graphs. In this thesis, we have investigated the importance of the structural information from two perspectives, the traditional graph-based methods and the new advances on Geometric Deep Learning. On the one hand, we explore the problem of defining a graph representation and how to deal with it on a large scale and noisy scenario. On the other hand, Graph Neural Networks are proposed to first redefine a Graph Edit Distance methodologies as a metric learning problem, and second, to apply them in a real use case scenario for the detection of repetitive patterns which define tables in invoice documents. As experimental framework, we have validated the different methodological contributions in the domain of Document Image Analysis and Recognition.\",\"PeriodicalId\":38711,\"journal\":{\"name\":\"Electronic Letters on Computer Vision and Image Analysis\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronic Letters on Computer Vision and Image Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5565/REV/ELCVIA.1313\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Letters on Computer Vision and Image Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5565/REV/ELCVIA.1313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

从早期阶段起，模式识别和计算机视觉社区就考虑了在理解图像时利用结构信息的重要性。通常，图被认为是表示这类信息的合适模型，因为它们具有灵活性和表示能力，能够编码组件、对象或实体及其成对关系。尽管图已经成功地应用于各种各样的任务，但由于其符号和关系性质，与统计方法相比，图总是受到一些限制。事实上，一些琐碎的数学运算在图域中没有等价性。例如，在许多模式识别应用程序的核心中，需要比较两个对象。此操作在考虑中定义的特征向量时是微不足道的ℝn、没有为图正确定义。在本文中，我们从两个角度研究了结构信息的重要性，即传统的基于图的方法和几何深度学习的新进展。一方面，我们探讨了定义图表示的问题，以及如何在大规模和有噪声的场景中处理它。另一方面，图神经网络被提议首先将图编辑距离方法重新定义为一个度量学习问题，其次，将其应用于真实用例场景中，用于检测定义发票文档中表格的重复模式。作为实验框架，我们已经验证了在文档图像分析和识别领域的不同方法学贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Distilling Structure from Imagery: Graph-based Models for the Interpretation of Document Images

From its early stages, the community of Pattern Recognition and Computer Vision has considered the importance of leveraging the structural information when understanding images. Usually, graphs have been proposed as a suitable model to represent this kind of information due to their flexibility and representational power able to codify both, the components, objects, or entities and their pairwise relationship. Even though graphs have been successfully applied to a huge variety of tasks, as a result of their symbolic and relational nature, graphs have always suffered from some limitations compared to statistical approaches. Indeed, some trivial mathematical operations do not have an equivalence in the graph domain. For instance, in the core of many pattern recognition applications, there is a need to compare two objects. This operation, which is trivial when considering feature vectors defined in ℝn, is not properly defined for graphs. In this thesis, we have investigated the importance of the structural information from two perspectives, the traditional graph-based methods and the new advances on Geometric Deep Learning. On the one hand, we explore the problem of defining a graph representation and how to deal with it on a large scale and noisy scenario. On the other hand, Graph Neural Networks are proposed to first redefine a Graph Edit Distance methodologies as a metric learning problem, and second, to apply them in a real use case scenario for the detection of repetitive patterns which define tables in invoice documents. As experimental framework, we have validated the different methodological contributions in the domain of Document Image Analysis and Recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Electronic Letters on Computer Vision and Image Analysis Computer Science-Computer Vision and Pattern Recognition

CiteScore

2.50

自引率

0.00%

发文量

审稿时长

12 weeks