INDREX:数据库内分布关系提取

T. Kilias, Alexander Löser, Periklis Andritsos
{"title":"INDREX:数据库内分布关系提取","authors":"T. Kilias, Alexander Löser, Periklis Andritsos","doi":"10.1145/2513190.2513196","DOIUrl":null,"url":null,"abstract":"Relation extraction transforms the textual representation of a relationship into the relational model of a data warehouse. Early systems, such as SystemT by IBM or the open source system GATE solve this task with handcrafted rule sets that the system executes document-by-document. Thereby the user must execute a highly interactive and iterative process of reading a document, of expressing rules, of testing these rules on the next document and of refining rules. Until now, these systems do neither leverage the full potential of built-in declarative query languages nor the indexing and query optimization techniques of a modern RDBMS that would enable a user interactive rule refinement across documents and on the entire corpus. We propose the INDREX system that enables a user for the first time to describe corpus-wide extraction tasks in a declarative language and permits the user to run interactive rule refinement queries. For enabling this powerful functionality we extend a standard PostgreSQL with a set of white-box user-defined functions that enable corpus-wide transformations from sentences into relationships. We store the text corpus and rules in the same RDBMS that already holds domain specific structured data. As a result, (1) the user can leverage this data to further adapt rules to the target domain, (2) the user does not need an additional system for rule extraction and (3) the INDREX system can leverage the full power of built-in indexing and query optimization techniques of the underlaying RDBMS. In a preliminary study we report on the feasibility of this disruptive approach and show multiple queries in INDREX on the Reuters Corpus, Volume 1.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"215 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"INDREX: in-database distributional relation extraction\",\"authors\":\"T. Kilias, Alexander Löser, Periklis Andritsos\",\"doi\":\"10.1145/2513190.2513196\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relation extraction transforms the textual representation of a relationship into the relational model of a data warehouse. Early systems, such as SystemT by IBM or the open source system GATE solve this task with handcrafted rule sets that the system executes document-by-document. Thereby the user must execute a highly interactive and iterative process of reading a document, of expressing rules, of testing these rules on the next document and of refining rules. Until now, these systems do neither leverage the full potential of built-in declarative query languages nor the indexing and query optimization techniques of a modern RDBMS that would enable a user interactive rule refinement across documents and on the entire corpus. We propose the INDREX system that enables a user for the first time to describe corpus-wide extraction tasks in a declarative language and permits the user to run interactive rule refinement queries. For enabling this powerful functionality we extend a standard PostgreSQL with a set of white-box user-defined functions that enable corpus-wide transformations from sentences into relationships. We store the text corpus and rules in the same RDBMS that already holds domain specific structured data. As a result, (1) the user can leverage this data to further adapt rules to the target domain, (2) the user does not need an additional system for rule extraction and (3) the INDREX system can leverage the full power of built-in indexing and query optimization techniques of the underlaying RDBMS. In a preliminary study we report on the feasibility of this disruptive approach and show multiple queries in INDREX on the Reuters Corpus, Volume 1.\",\"PeriodicalId\":335396,\"journal\":{\"name\":\"International Workshop on Data Warehousing and OLAP\",\"volume\":\"215 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on Data Warehousing and OLAP\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2513190.2513196\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Data Warehousing and OLAP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2513190.2513196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

关系提取将关系的文本表示转换为数据仓库的关系模型。早期的系统,如IBM的SystemT或开放源码系统GATE,通过系统逐个文档执行的手工制作的规则集来解决这个任务。因此,用户必须执行一个高度交互和迭代的过程,包括阅读文档、表达规则、在下一个文档上测试这些规则以及改进规则。到目前为止,这些系统既没有充分利用内置声明性查询语言的潜力,也没有利用现代RDBMS的索引和查询优化技术,这些技术可以使用户在文档和整个语料库上进行交互式规则优化。我们提出了INDREX系统,该系统首次使用户能够用声明性语言描述语料库范围内的提取任务,并允许用户运行交互式规则细化查询。为了实现这个强大的功能,我们用一组白盒用户定义函数扩展了标准PostgreSQL,这些函数支持从句子到关系的语料库范围内的转换。我们将文本语料库和规则存储在已经保存特定领域结构化数据的相同RDBMS中。因此,(1)用户可以利用这些数据进一步使规则适应目标域,(2)用户不需要额外的系统来提取规则,(3)INDREX系统可以充分利用底层RDBMS的内置索引和查询优化技术的全部功能。在初步研究中,我们报告了这种破坏性方法的可行性,并在路透社语料库第1卷上展示了INDREX中的多个查询。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
INDREX: in-database distributional relation extraction
Relation extraction transforms the textual representation of a relationship into the relational model of a data warehouse. Early systems, such as SystemT by IBM or the open source system GATE solve this task with handcrafted rule sets that the system executes document-by-document. Thereby the user must execute a highly interactive and iterative process of reading a document, of expressing rules, of testing these rules on the next document and of refining rules. Until now, these systems do neither leverage the full potential of built-in declarative query languages nor the indexing and query optimization techniques of a modern RDBMS that would enable a user interactive rule refinement across documents and on the entire corpus. We propose the INDREX system that enables a user for the first time to describe corpus-wide extraction tasks in a declarative language and permits the user to run interactive rule refinement queries. For enabling this powerful functionality we extend a standard PostgreSQL with a set of white-box user-defined functions that enable corpus-wide transformations from sentences into relationships. We store the text corpus and rules in the same RDBMS that already holds domain specific structured data. As a result, (1) the user can leverage this data to further adapt rules to the target domain, (2) the user does not need an additional system for rule extraction and (3) the INDREX system can leverage the full power of built-in indexing and query optimization techniques of the underlaying RDBMS. In a preliminary study we report on the feasibility of this disruptive approach and show multiple queries in INDREX on the Reuters Corpus, Volume 1.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信