ICIX: A Semantic Information Extraction Architecture

Proceedings of the 25th International Database Engineering & Applications Symposium Pub Date : 2021-07-14 DOI:10.1145/3472163.3472174

Angel L. Garrido, Álvaro Peiró, Carlos Bobed, E. Mena, Cristian Morte

{"title":"ICIX: A Semantic Information Extraction Architecture","authors":"Angel L. Garrido, Álvaro Peiró, Carlos Bobed, E. Mena, Cristian Morte","doi":"10.1145/3472163.3472174","DOIUrl":null,"url":null,"abstract":"Public and private organizations produce and store huge amounts of documents which contain information about their domains in non-structured formats. Although from the final user’s point of view we can rely on different retrieval tools to access such data, the progressive structuring of such documents has important benefits for daily operations. While there exist many approaches to extract information in open domains, we lack tools flexible enough to adapt themselves to the particularities of different domains. In this paper, we present the design and implementation of ICIX, an architecture to extract structured information from text documents. ICIX aims at obtaining specific information within a given domain, defined by means of an ontology which guides the extraction process. Besides, to optimize such an extraction, ICIX relies on document classification and data curation adapted to the particular domain. Our proposal has been implemented and evaluated in the specific context of managing legal documents, with promising results.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Database Engineering & Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472163.3472174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Public and private organizations produce and store huge amounts of documents which contain information about their domains in non-structured formats. Although from the final user’s point of view we can rely on different retrieval tools to access such data, the progressive structuring of such documents has important benefits for daily operations. While there exist many approaches to extract information in open domains, we lack tools flexible enough to adapt themselves to the particularities of different domains. In this paper, we present the design and implementation of ICIX, an architecture to extract structured information from text documents. ICIX aims at obtaining specific information within a given domain, defined by means of an ontology which guides the extraction process. Besides, to optimize such an extraction, ICIX relies on document classification and data curation adapted to the particular domain. Our proposal has been implemented and evaluated in the specific context of managing legal documents, with promising results.

查看原文本刊更多论文

语义信息抽取体系结构

公共和私人组织产生并存储大量的文档，这些文档以非结构化格式包含有关其领域的信息。虽然从最终用户的角度来看，我们可以依靠不同的检索工具来访问这些数据，但是这些文档的渐进式结构对于日常操作具有重要的好处。虽然有许多方法可以在开放领域中提取信息，但我们缺乏足够灵活的工具来适应不同领域的特殊性。在本文中，我们提出了一个从文本文档中提取结构化信息的体系结构ICIX的设计和实现。ICIX旨在获取给定领域内的特定信息，该领域通过指导提取过程的本体来定义。此外，为了优化这样的提取，ICIX依赖于适应特定领域的文档分类和数据管理。我们的建议在管理法律文件的具体背景下得到了执行和评价，取得了可喜的成果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th International Database Engineering & Applications Symposium

自引率

0.00%

发文量