Table Cell Search for Question Answering

Proceedings of the 25th International Conference on World Wide Web Pub Date : 2016-04-11 DOI:10.1145/2872427.2883080

Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, Xifeng Yan

{"title":"Table Cell Search for Question Answering","authors":"Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, Xifeng Yan","doi":"10.1145/2872427.2883080","DOIUrl":null,"url":null,"abstract":"Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"98 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"109","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2872427.2883080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 109

Abstract

Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.

查看原文本刊更多论文

表格单元格搜索问题回答

表格在网络上无处不在。信息性web表涵盖了各种各样的主题，自然可以作为满足用户信息需求的重要资源。在这种观察的驱使下，在本文中，我们研究了一个重要但在很大程度上没有得到解决的问题:给定数百万个表，如何精确地检索表单元格来回答用户的问题。这项工作提出了一个新的表单元格搜索框架来解决这个问题。我们首先形成关系链的概念，它连接表中的两个单元格，并表示它们之间的语义关系。在搜索引擎片段的帮助下，我们的框架生成一组指向可能正确答案单元格的关系链。我们进一步使用深度神经网络对哪个关系链最适合输入问题进行更细粒度的推理，并最终提取相应的答案单元。基于从Web上抓取的数百万个表，我们在开放域问答(QA)设置中评估了我们的框架，使用了众所周知的WebQuestions数据集和从Bing搜索引擎日志中挖掘的用户查询。在WebQuestions上，我们的框架可以与最先进的基于知识库(KBs)的QA系统相媲美，而在Bing查询上，它以56.7%的相对增益优于其他系统。此外，当与我们的框架的结果相结合时，基于知识库的QA性能可以获得28.1%到66.7%的相对改进，这表明web表提供了丰富的知识，这些知识可能在现有知识库中不存在或难以识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th International Conference on World Wide Web

自引率

0.00%

发文量