CALM:用于文档图像理解的常识知识增强

Proceedings of the 30th ACM International Conference on Multimedia Pub Date : 2022-10-10 DOI:10.1145/3503161.3548321

Qinyi Du, Qingqing Wang, Keqian Li, Jidong Tian, Liqiang Xiao, Yaohui Jin

{"title":"CALM:用于文档图像理解的常识知识增强","authors":"Qinyi Du, Qingqing Wang, Keqian Li, Jidong Tian, Liqiang Xiao, Yaohui Jin","doi":"10.1145/3503161.3548321","DOIUrl":null,"url":null,"abstract":"Performance of document image understanding has been significantly fueled by encoding multi-modal information in recent years. However, existing works heavily rely on the superficial appearance of the observed data, resulting in counter-intuitive model behavior in many critical cases. To overcome this issue, this paper proposes a common-sense knowledge augmented model CALM for document image understanding tasks. It firstly produces purified representations of document contents to extract key information and learn common-sense augmented representation for inputs. Then, relevant common-sense knowledge is extracted from the external ConceptNet knowledge base, and a derived knowledge graph is built to enhance the common-sense reasoning capability of CALM jointly. In order to further highlight the importance of common-sense knowledge in document image understanding, we propose the first question-answering dataset, CS-DVQA, focused on common-sense reasoning for document images, in which questions are answered by taking both document contents and common-sense knowledge into consideration. Through extensive evaluation, the proposed CALM approach outperforms the state-of-the-art models in three document image understanding tasks, including key information extraction(from 85.37 to 86.52), document image classification(from 96.08 to 96.17), document visual question answering(from 86.72 to 88.03).","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"CALM: Commen-Sense Knowledge Augmentation for Document Image Understanding\",\"authors\":\"Qinyi Du, Qingqing Wang, Keqian Li, Jidong Tian, Liqiang Xiao, Yaohui Jin\",\"doi\":\"10.1145/3503161.3548321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Performance of document image understanding has been significantly fueled by encoding multi-modal information in recent years. However, existing works heavily rely on the superficial appearance of the observed data, resulting in counter-intuitive model behavior in many critical cases. To overcome this issue, this paper proposes a common-sense knowledge augmented model CALM for document image understanding tasks. It firstly produces purified representations of document contents to extract key information and learn common-sense augmented representation for inputs. Then, relevant common-sense knowledge is extracted from the external ConceptNet knowledge base, and a derived knowledge graph is built to enhance the common-sense reasoning capability of CALM jointly. In order to further highlight the importance of common-sense knowledge in document image understanding, we propose the first question-answering dataset, CS-DVQA, focused on common-sense reasoning for document images, in which questions are answered by taking both document contents and common-sense knowledge into consideration. Through extensive evaluation, the proposed CALM approach outperforms the state-of-the-art models in three document image understanding tasks, including key information extraction(from 85.37 to 86.52), document image classification(from 96.08 to 96.17), document visual question answering(from 86.72 to 88.03).\",\"PeriodicalId\":412792,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3503161.3548321\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

近年来，多模态信息编码极大地提高了文档图像理解的性能。然而，现有的工作严重依赖于观测数据的表面现象，导致在许多关键情况下模型行为违反直觉。为了克服这一问题，本文提出了一种用于文档图像理解任务的常识知识增强模型CALM。它首先生成文档内容的纯化表示，以提取关键信息并学习输入的常识性增强表示。然后，从外部ConceptNet知识库中提取相关常识知识，并构建派生的知识图，共同增强CALM的常识推理能力;为了进一步强调常识知识在文档图像理解中的重要性，我们提出了第一个问答数据集CS-DVQA，该数据集主要关注文档图像的常识推理，其中通过考虑文档内容和常识知识来回答问题。通过广泛的评估，本文提出的CALM方法在关键信息提取(85.37 ~ 86.52)、文档图像分类(96.08 ~ 96.17)、文档视觉问答(86.72 ~ 88.03)三个文档图像理解任务上优于现有模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CALM: Commen-Sense Knowledge Augmentation for Document Image Understanding

Performance of document image understanding has been significantly fueled by encoding multi-modal information in recent years. However, existing works heavily rely on the superficial appearance of the observed data, resulting in counter-intuitive model behavior in many critical cases. To overcome this issue, this paper proposes a common-sense knowledge augmented model CALM for document image understanding tasks. It firstly produces purified representations of document contents to extract key information and learn common-sense augmented representation for inputs. Then, relevant common-sense knowledge is extracted from the external ConceptNet knowledge base, and a derived knowledge graph is built to enhance the common-sense reasoning capability of CALM jointly. In order to further highlight the importance of common-sense knowledge in document image understanding, we propose the first question-answering dataset, CS-DVQA, focused on common-sense reasoning for document images, in which questions are answered by taking both document contents and common-sense knowledge into consideration. Through extensive evaluation, the proposed CALM approach outperforms the state-of-the-art models in three document image understanding tasks, including key information extraction(from 85.37 to 86.52), document image classification(from 96.08 to 96.17), document visual question answering(from 86.72 to 88.03).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 30th ACM International Conference on Multimedia

自引率

0.00%

发文量