Yuchuan Hu, Bitao Hu, Bing Guo, Cheng Dai, Yan Shen
{"title":"SDDP: sensitive data detection method for user-controlled data pricing","authors":"Yuchuan Hu, Bitao Hu, Bing Guo, Cheng Dai, Yan Shen","doi":"10.1007/s10489-025-06229-3","DOIUrl":null,"url":null,"abstract":"<p>In the era of big data, there is an urgent need for data sharing, in which data pricing is a crucial issue, because a reasonable price can not only enhance the willingness of users to share data but also promote the progress of data sharing. However, current research is mostly approached from the perspective of data sharing platforms, treating all data equally without sufficient evaluation of sensitive data within shared datasets and personalized perception of privacy from the users themselves. To address this problem, we detected sensitive data in each piece of data and then defined the pricing function based on information entropy and the user’s perception of sensitive information. To enhance the accuracy of sensitive data detection, we integrated an attention mechanism into a pre-trained model to comprehensively represent the samples. Subsequently, on the basis of automatically generating label correlation vectors to calculate the correlation matrix, a graph convolutional neural network was employed to mine the correlation between labels. Furthermore, based on the detection results, information entropy and user ratings are reasonably mapped to prices. Pricing based on user ratings is more suitable for pricing personal data rather than government or institutional data. The experimental results on the dataset of Twitter text sent by users have demonstrated that the average precision of our sensitive data detection model has improved by up to 9.26% compared to comparison models, and SDDP can provide reasonable pricing for samples containing sensitive data and fair compensation for users.</p>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06229-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the era of big data, there is an urgent need for data sharing, in which data pricing is a crucial issue, because a reasonable price can not only enhance the willingness of users to share data but also promote the progress of data sharing. However, current research is mostly approached from the perspective of data sharing platforms, treating all data equally without sufficient evaluation of sensitive data within shared datasets and personalized perception of privacy from the users themselves. To address this problem, we detected sensitive data in each piece of data and then defined the pricing function based on information entropy and the user’s perception of sensitive information. To enhance the accuracy of sensitive data detection, we integrated an attention mechanism into a pre-trained model to comprehensively represent the samples. Subsequently, on the basis of automatically generating label correlation vectors to calculate the correlation matrix, a graph convolutional neural network was employed to mine the correlation between labels. Furthermore, based on the detection results, information entropy and user ratings are reasonably mapped to prices. Pricing based on user ratings is more suitable for pricing personal data rather than government or institutional data. The experimental results on the dataset of Twitter text sent by users have demonstrated that the average precision of our sensitive data detection model has improved by up to 9.26% compared to comparison models, and SDDP can provide reasonable pricing for samples containing sensitive data and fair compensation for users.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.