{"title":"Using natural language to integrate, evaluate, and optimize extracted knowledge bases","authors":"Doug Downey, Chandra Bhagavatula, A. Yates","doi":"10.1145/2509558.2509569","DOIUrl":null,"url":null,"abstract":"Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by exchanging probability distributions over natural language, and evaluated by how well the output distributions predict held-out text. We describe the advantages of the approach, and detail remaining research challenges.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Automated Knowledge Base Construction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2509558.2509569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by exchanging probability distributions over natural language, and evaluated by how well the output distributions predict held-out text. We describe the advantages of the approach, and detail remaining research challenges.
Web信息提取(Web Information Extraction, WIE)系统提取了数十亿个独特的事实,但是将断言集成到一个连贯的知识库中并跨不同的WIE技术进行评估仍然是一个挑战。我们提出了一个框架,利用自然语言来整合和评估提取的知识库(KBs)。在框架中,KBs通过交换自然语言上的概率分布来集成,并通过输出分布预测持有文本的程度来评估。我们描述了该方法的优点,并详细介绍了剩余的研究挑战。