Charles De Trogoff, Rim Hantach, Gisela Lechuga, P. Calvez
{"title":"从视觉丰富的文档中自动提取关键信息","authors":"Charles De Trogoff, Rim Hantach, Gisela Lechuga, P. Calvez","doi":"10.1109/ICMLA55696.2022.00020","DOIUrl":null,"url":null,"abstract":"Currently, the need for business documents analysis, particularly invoices, is playing a vital role in companies, especially in large ones. These documents have the particularity of being visually rich, with low text quantity and many different layouts. As such, processing them with traditional techniques remains inefficient. Hence, one of the key challenge is to exploit visual patterns between entities of interest. After an overview of the state-of-the-art in this domain, we propose a graph-based model that recognizes specific text in invoices. First, an Encoder module creates a multimodal embedding for each text sequence based on textual, visual, and spatial information. This representation is then passed through a multi-layer graph attention network, before being subjected to a simple classification task. Some experimental results were conducted in order to improve the performance of the proposed approach.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Key Information Extraction from Visually Rich Documents\",\"authors\":\"Charles De Trogoff, Rim Hantach, Gisela Lechuga, P. Calvez\",\"doi\":\"10.1109/ICMLA55696.2022.00020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, the need for business documents analysis, particularly invoices, is playing a vital role in companies, especially in large ones. These documents have the particularity of being visually rich, with low text quantity and many different layouts. As such, processing them with traditional techniques remains inefficient. Hence, one of the key challenge is to exploit visual patterns between entities of interest. After an overview of the state-of-the-art in this domain, we propose a graph-based model that recognizes specific text in invoices. First, an Encoder module creates a multimodal embedding for each text sequence based on textual, visual, and spatial information. This representation is then passed through a multi-layer graph attention network, before being subjected to a simple classification task. Some experimental results were conducted in order to improve the performance of the proposed approach.\",\"PeriodicalId\":128160,\"journal\":{\"name\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA55696.2022.00020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic Key Information Extraction from Visually Rich Documents
Currently, the need for business documents analysis, particularly invoices, is playing a vital role in companies, especially in large ones. These documents have the particularity of being visually rich, with low text quantity and many different layouts. As such, processing them with traditional techniques remains inefficient. Hence, one of the key challenge is to exploit visual patterns between entities of interest. After an overview of the state-of-the-art in this domain, we propose a graph-based model that recognizes specific text in invoices. First, an Encoder module creates a multimodal embedding for each text sequence based on textual, visual, and spatial information. This representation is then passed through a multi-layer graph attention network, before being subjected to a simple classification task. Some experimental results were conducted in order to improve the performance of the proposed approach.