Efficient Feedback Collection for Pay-as-you-go Source Selection

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-07-18 DOI:10.1145/2949689.2949690

Julio César Cortés Ríos, N. Paton, A. Fernandes, Khalid Belhajjame

{"title":"Efficient Feedback Collection for Pay-as-you-go Source Selection","authors":"Julio César Cortés Ríos, N. Paton, A. Fernandes, Khalid Belhajjame","doi":"10.1145/2949689.2949690","DOIUrl":null,"url":null,"abstract":"Technical developments, such as the web of data and web data extraction, combined with policy developments such as those relating to open government or open science, are leading to the availability of increasing numbers of data sources. Indeed, given these physical sources, it is then also possible to create further virtual sources that integrate, aggregate or summarise the data from the original sources. As a result, there is a plethora of data sources, from which a small subset may be able to provide the information required to support a task. The number and rate of change in the available sources is likely to make manual source selection and curation by experts impractical for many applications, leading to the need to pursue a pay-as-you-go approach, in which crowds or data consumers annotate results based on their correctness or suitability, with the resulting annotations used to inform, e.g., source selection algorithms. However, for pay-as-you-go feedback collection to be cost-effective, it may be necessary to select judiciously the data items on which feedback is to be obtained. This paper describes OLBP (Ordering and Labelling By Precision), a heuristics-based approach to the targeting of data items for feedback to support mapping and source selection tasks, where users express their preferences in terms of the trade-off between precision and recall. The proposed approach is then evaluated on two different scenarios, mapping selection with synthetic data, and source selection with real data produced by web data extraction. The results demonstrate a significant reduction in the amount of feedback required to reach user-provided objectives when using OLBP.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949689.2949690","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Technical developments, such as the web of data and web data extraction, combined with policy developments such as those relating to open government or open science, are leading to the availability of increasing numbers of data sources. Indeed, given these physical sources, it is then also possible to create further virtual sources that integrate, aggregate or summarise the data from the original sources. As a result, there is a plethora of data sources, from which a small subset may be able to provide the information required to support a task. The number and rate of change in the available sources is likely to make manual source selection and curation by experts impractical for many applications, leading to the need to pursue a pay-as-you-go approach, in which crowds or data consumers annotate results based on their correctness or suitability, with the resulting annotations used to inform, e.g., source selection algorithms. However, for pay-as-you-go feedback collection to be cost-effective, it may be necessary to select judiciously the data items on which feedback is to be obtained. This paper describes OLBP (Ordering and Labelling By Precision), a heuristics-based approach to the targeting of data items for feedback to support mapping and source selection tasks, where users express their preferences in terms of the trade-off between precision and recall. The proposed approach is then evaluated on two different scenarios, mapping selection with synthetic data, and source selection with real data produced by web data extraction. The results demonstrate a significant reduction in the amount of feedback required to reach user-provided objectives when using OLBP.

查看原文本刊更多论文

有效的反馈收集，即付即用源选择

数据网络和网络数据提取等技术发展，加上与开放政府或开放科学有关的政策发展，正在导致越来越多的数据源的可用性。事实上，有了这些物理来源，还可以创建进一步的虚拟来源，整合、汇总或总结来自原始来源的数据。因此，存在过多的数据源，其中一小部分可能能够提供支持任务所需的信息。可用源的数量和变化速度可能会使专家手动选择源和管理对许多应用程序来说不切实际，导致需要追求一种随用随付的方法，在这种方法中，人群或数据消费者根据结果的正确性或适用性对结果进行注释，并使用由此产生的注释来通知，例如，源选择算法。然而，为了使现收现付的反馈收集具有成本效益，可能需要明智地选择要获得反馈的数据项。本文描述了OLBP(排序和标记精度)，这是一种基于启发式的方法，用于针对数据项进行反馈，以支持映射和源选择任务，其中用户根据精度和召回率之间的权衡来表达他们的偏好。然后在两种不同的场景下对所提出的方法进行了评估，即使用合成数据进行映射选择，以及使用web数据提取产生的真实数据进行源选择。结果表明，在使用OLBP时，达到用户提供的目标所需的反馈量显著减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 28th International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量