Efficiently incorporating user feedback into information extraction and integration programs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI:10.1145/1559845.1559857

Xiaoyong Chai, Ba-Quy Vuong, A. Doan, J. Naughton

{"title":"Efficiently incorporating user feedback into information extraction and integration programs","authors":"Xiaoyong Chai, Ba-Quy Vuong, A. Doan, J. Naughton","doi":"10.1145/1559845.1559857","DOIUrl":null,"url":null,"abstract":"Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process. In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer U uses hlog, a declarative IE/II language, to write an IE/II program P. Next, U writes declarative user feedback rules that specify which parts of P's data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program P is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback F on a data portion of P, we show how to automatically propagate F to the rest of P, and to seamlessly combine F with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1559845.1559857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

Abstract

Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process. In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer U uses hlog, a declarative IE/II language, to write an IE/II program P. Next, U writes declarative user feedback rules that specify which parts of P's data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program P is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback F on a data portion of P, we show how to automatically propagate F to the rest of P, and to seamlessly combine F with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution.

查看原文本刊更多论文

有效地将用户反馈纳入信息提取和集成程序

许多应用程序越来越多地使用信息提取和集成(IE/II)程序从非结构化数据中推断结构。自动IE/II本身就是不精确的。因此，这样的程序经常会犯许多IE/II错误，因此可以从用户反馈中获益良多。然而，目前还没有一种好的方法可以自动提供和处理这样的反馈。当发现IE/II错误时，用户通常必须提醒开发团队(例如，通过电子邮件或Web表单)这个错误，然后等待开发团队手动检查程序内部以定位和修复错误，这是一个缓慢、容易出错且令人沮丧的过程。本文提出了一种用户直接提供反馈和IE/II程序自动处理反馈的解决方案。在我们的解决方案中，开发人员U使用hlog(一种声明性IE/II语言)来编写IE/II程序P。接下来，U编写声明性用户反馈规则，指定用户可以编辑P的数据(例如，输入、中间或输出数据)的哪些部分，以及通过哪些用户界面。接下来，执行如此扩展的程序P，然后进入等待和合并用户反馈的循环。给定用户对P的数据部分的反馈F，我们展示了如何自动将F传播到P的其余部分，并将F与先前的用户反馈无缝地结合起来。我们描述了hlog的语法和语义，一个基线执行策略，然后是各种优化技术。最后，我们描述了用真实世界数据进行的实验，这些实验证明了我们的解决方案的前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

自引率

0.00%

发文量