ICIX: A Semantic Information Extraction Architecture

Angel L. Garrido, Álvaro Peiró, Carlos Bobed, E. Mena, Cristian Morte
{"title":"ICIX: A Semantic Information Extraction Architecture","authors":"Angel L. Garrido, Álvaro Peiró, Carlos Bobed, E. Mena, Cristian Morte","doi":"10.1145/3472163.3472174","DOIUrl":null,"url":null,"abstract":"Public and private organizations produce and store huge amounts of documents which contain information about their domains in non-structured formats. Although from the final user’s point of view we can rely on different retrieval tools to access such data, the progressive structuring of such documents has important benefits for daily operations. While there exist many approaches to extract information in open domains, we lack tools flexible enough to adapt themselves to the particularities of different domains. In this paper, we present the design and implementation of ICIX, an architecture to extract structured information from text documents. ICIX aims at obtaining specific information within a given domain, defined by means of an ontology which guides the extraction process. Besides, to optimize such an extraction, ICIX relies on document classification and data curation adapted to the particular domain. Our proposal has been implemented and evaluated in the specific context of managing legal documents, with promising results.","PeriodicalId":242683,"journal":{"name":"Proceedings of the 25th International Database Engineering & Applications Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Database Engineering & Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472163.3472174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Public and private organizations produce and store huge amounts of documents which contain information about their domains in non-structured formats. Although from the final user’s point of view we can rely on different retrieval tools to access such data, the progressive structuring of such documents has important benefits for daily operations. While there exist many approaches to extract information in open domains, we lack tools flexible enough to adapt themselves to the particularities of different domains. In this paper, we present the design and implementation of ICIX, an architecture to extract structured information from text documents. ICIX aims at obtaining specific information within a given domain, defined by means of an ontology which guides the extraction process. Besides, to optimize such an extraction, ICIX relies on document classification and data curation adapted to the particular domain. Our proposal has been implemented and evaluated in the specific context of managing legal documents, with promising results.
语义信息抽取体系结构
公共和私人组织产生并存储大量的文档,这些文档以非结构化格式包含有关其领域的信息。虽然从最终用户的角度来看,我们可以依靠不同的检索工具来访问这些数据,但是这些文档的渐进式结构对于日常操作具有重要的好处。虽然有许多方法可以在开放领域中提取信息,但我们缺乏足够灵活的工具来适应不同领域的特殊性。在本文中,我们提出了一个从文本文档中提取结构化信息的体系结构ICIX的设计和实现。ICIX旨在获取给定领域内的特定信息,该领域通过指导提取过程的本体来定义。此外,为了优化这样的提取,ICIX依赖于适应特定领域的文档分类和数据管理。我们的建议在管理法律文件的具体背景下得到了执行和评价,取得了可喜的成果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信