A Hybrid Method for Open Information Extraction Based on Shallow and Deep Linguistic Analysis

Vahideh Reshadat, Maryam Hoorali, Heshaam Faili
{"title":"A Hybrid Method for Open Information Extraction Based on Shallow and Deep Linguistic Analysis","authors":"Vahideh Reshadat, Maryam Hoorali, Heshaam Faili","doi":"10.4036/IIS.2016.R.03","DOIUrl":null,"url":null,"abstract":"Open Information Extraction is a relation-independent extraction paradigm that extracts assertions from massive and heterogeneous corpora such as the Web. Light relation extractors focus on efficiency by restricting analysis to some shallow linguistic tools such as part-of-speech tagging. Although these methods are fast and scalable, they are unable to deal with complex sentences (such as complicated and long distance relations) due to using only shallow syntactic features. This paper presents two novel hybrid methods, TextRunner-DepOE (TR-DOE) and ReVerb-DepOE (RV-DOE) which combine high-performance subset of shallow Open IE systems with the strengths of a deep Open IE system. We detect the best trade-off between precision and recall by tuning two combination parameters: sentence length and confidence measure. Since the focus is on using time efficiently, we used a fast and robust deep extractor. Experiments indicate that the proposed hybrid methods obtain significantly higher performance than their constituent systems. The best result was for TR-DOE which had an F-measure almost twice that of TextRunner.","PeriodicalId":91087,"journal":{"name":"Interdisciplinary information sciences","volume":"22 1","pages":"87-100"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4036/IIS.2016.R.03","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary information sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4036/IIS.2016.R.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Open Information Extraction is a relation-independent extraction paradigm that extracts assertions from massive and heterogeneous corpora such as the Web. Light relation extractors focus on efficiency by restricting analysis to some shallow linguistic tools such as part-of-speech tagging. Although these methods are fast and scalable, they are unable to deal with complex sentences (such as complicated and long distance relations) due to using only shallow syntactic features. This paper presents two novel hybrid methods, TextRunner-DepOE (TR-DOE) and ReVerb-DepOE (RV-DOE) which combine high-performance subset of shallow Open IE systems with the strengths of a deep Open IE system. We detect the best trade-off between precision and recall by tuning two combination parameters: sentence length and confidence measure. Since the focus is on using time efficiently, we used a fast and robust deep extractor. Experiments indicate that the proposed hybrid methods obtain significantly higher performance than their constituent systems. The best result was for TR-DOE which had an F-measure almost twice that of TextRunner.
一种基于浅、深语言分析的开放信息提取混合方法
开放信息抽取是一种独立于关系的抽取范式,它从海量异构语料库(如Web)中抽取断言。轻关系提取器通过将分析限制在一些肤浅的语言工具(如词性标注)上来关注效率。虽然这些方法快速且可扩展,但由于只使用了肤浅的句法特征,它们无法处理复杂的句子(例如复杂和长距离关系)。本文提出了两种新的混合方法,TextRunner-DepOE (TR-DOE)和ReVerb-DepOE (RV-DOE),它们结合了浅层开放IE系统的高性能子集和深层开放IE系统的优势。我们通过调整两个组合参数:句子长度和置信度来检测准确率和召回率之间的最佳权衡。由于重点是有效利用时间,我们使用了快速和鲁棒的深度提取器。实验表明,该混合方法的性能明显高于其组成系统。最好的结果是TR-DOE,其f值几乎是TextRunner的两倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信