A Study on Information Extraction: Application to Administrative Document Images

2022 9th NAFOSTED Conference on Information and Computer Science (NICS) Pub Date : 2022-10-31 DOI:10.1109/NICS56915.2022.10013381

Huu Thang Nguyen, Cong Linh Le, Hoai-Nam Tran, T. A. Tran

引用次数: 0

Abstract

This paper presents a study on the problem of information extraction and its application in building an information extraction system for administrative documents. The proposed end-to-end system contains three significant modules, including Text detection (TD), Optical character recognition (OCR), and Information extraction (IE). We developed the IE module by us based on two platforms, GraphSAGE and GATs. We have made many changes and improvements, such as redesigning graph modeling and node representation to match the goals and problems posed. We also elaborately studied to establish a complete information extraction system and dived into the information extraction module instead of all modules in the system. Besides that, we also built and evaluated our dataset of Vietnamese Administrative Documents Images (VADI2021).

查看原文本刊更多论文

信息提取技术在行政文件图像中的应用研究

本文研究了信息抽取问题及其在行政公文信息抽取系统建设中的应用。提出的端到端系统包含三个重要模块:文本检测(TD)、光学字符识别(OCR)和信息提取(IE)。我们基于GraphSAGE和GATs两个平台开发了IE模块。我们做了很多改变和改进，比如重新设计图形建模和节点表示来匹配所提出的目标和问题。我们还精心研究建立了一个完整的信息提取系统，并深入研究了信息提取模块，而不是系统中的所有模块。此外，我们还建立并评估了我们的越南行政文件图像数据集(VADI2021)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 9th NAFOSTED Conference on Information and Computer Science (NICS)

自引率

0.00%

发文量