When Crowdsourcing Meets Data Markets: A Fair Data Value Metric for Data Trading

IF 1.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology Pub Date : 2024-07-22 DOI:10.1007/s11390-023-2519-0

Yang-Su Liu, Zhen-Zhe Zheng, Fan Wu, Gui-Hai Chen

{"title":"When Crowdsourcing Meets Data Markets: A Fair Data Value Metric for Data Trading","authors":"Yang-Su Liu, Zhen-Zhe Zheng, Fan Wu, Gui-Hai Chen","doi":"10.1007/s11390-023-2519-0","DOIUrl":null,"url":null,"abstract":"<p>Large-quantity and high-quality data is critical to the success of machine learning in diverse applications. Faced with the dilemma of data silos where data is difficult to circulate, emerging data markets attempt to break the dilemma by facilitating data exchange on the Internet. Crowdsourcing, on the other hand, is one of the important methods to efficiently collect large amounts of data with high-value in data markets. In this paper, we investigate the joint problem of efficient data acquisition and fair budget distribution across the crowdsourcing and data markets. We propose a new metric of data value as the uncertainty reduction of a Bayesian machine learning model by integrating the data into model training. Guided by this data value metric, we design a mechanism called Shapley Value Mechanism with Individual Rationality (SV-IR), in which we design a greedy algorithm with a constant approximation ratio to greedily select the most cost-efficient data brokers, and a fair compensation determination rule based on the Shapley value, respecting the individual rationality constraints. We further propose a fair reward distribution method for the data holders with various effort levels under the charge of a data broker. We demonstrate the fairness of the compensation determination rule and reward distribution rule by evaluating our mechanisms on two real-world datasets. The evaluation results also show that the selection algorithm in SV-IR could approach the optimal solution, and outperforms state-of-the-art methods.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"50 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11390-023-2519-0","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Large-quantity and high-quality data is critical to the success of machine learning in diverse applications. Faced with the dilemma of data silos where data is difficult to circulate, emerging data markets attempt to break the dilemma by facilitating data exchange on the Internet. Crowdsourcing, on the other hand, is one of the important methods to efficiently collect large amounts of data with high-value in data markets. In this paper, we investigate the joint problem of efficient data acquisition and fair budget distribution across the crowdsourcing and data markets. We propose a new metric of data value as the uncertainty reduction of a Bayesian machine learning model by integrating the data into model training. Guided by this data value metric, we design a mechanism called Shapley Value Mechanism with Individual Rationality (SV-IR), in which we design a greedy algorithm with a constant approximation ratio to greedily select the most cost-efficient data brokers, and a fair compensation determination rule based on the Shapley value, respecting the individual rationality constraints. We further propose a fair reward distribution method for the data holders with various effort levels under the charge of a data broker. We demonstrate the fairness of the compensation determination rule and reward distribution rule by evaluating our mechanisms on two real-world datasets. The evaluation results also show that the selection algorithm in SV-IR could approach the optimal solution, and outperforms state-of-the-art methods.

查看原文本刊更多论文

当众包遇上数据市场：数据交易的公平数据价值度量

大量和高质量的数据是机器学习在各种应用中取得成功的关键。面对数据难以流通的 "数据孤岛 "困境，新兴的数据市场试图通过促进互联网上的数据交换来打破这一困境。众包则是数据市场有效收集大量高价值数据的重要方法之一。在本文中，我们研究了在众包和数据市场中高效获取数据和公平分配预算的共同问题。我们提出了一种新的数据价值度量方法，即通过将数据整合到模型训练中来降低贝叶斯机器学习模型的不确定性。在这一数据价值指标的指导下，我们设计了一种称为具有个体理性的夏普利价值机制（SV-IR）的机制。在这一机制中，我们设计了一种具有恒定逼近率的贪婪算法，以贪婪地选择最具成本效益的数据经纪人，并在尊重个体理性约束的前提下，设计了一种基于夏普利价值的公平报酬确定规则。我们进一步提出了一种公平的奖励分配方法，适用于由数据经纪人负责的不同努力程度的数据持有者。通过在两个真实数据集上对我们的机制进行评估，我们证明了补偿确定规则和奖励分配规则的公平性。评估结果还表明，SV-IR 中的选择算法可以接近最优解，并优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computer Science and Technology 工程技术-计算机：软件工程

CiteScore

4.00

自引率

0.00%

发文量

2255

审稿时长

9.8 months

期刊介绍： Journal of Computer Science and Technology (JCST), the first English language journal in the computer field published in China, is an international forum for scientists and engineers involved in all aspects of computer science and technology to publish high quality and refereed papers. Papers reporting original research and innovative applications from all parts of the world are welcome. Papers for publication in the journal are selected through rigorous peer review, to ensure originality, timeliness, relevance, and readability. While the journal emphasizes the publication of previously unpublished materials, selected conference papers with exceptional merit that require wider exposure are, at the discretion of the editors, also published, provided they meet the journal''s peer review standards. The journal also seeks clearly written survey and review articles from experts in the field, to promote insightful understanding of the state-of-the-art and technology trends. Topics covered by Journal of Computer Science and Technology include but are not limited to: -Computer Architecture and Systems -Artificial Intelligence and Pattern Recognition -Computer Networks and Distributed Computing -Computer Graphics and Multimedia -Software Systems -Data Management and Data Mining -Theory and Algorithms -Emerging Areas