An Improved Profile-Based CF Scheme with Privacy

Alper Bilge, H. Polat
{"title":"An Improved Profile-Based CF Scheme with Privacy","authors":"Alper Bilge, H. Polat","doi":"10.1109/ICSC.2011.20","DOIUrl":null,"url":null,"abstract":"Traditional collaborative filtering (CF) systems widely employing k-nearest neighbor (kNN) algorithms mostly attempt to alleviate the contemporary problem of information overload by generating personalized predictions for items that users might like. Unlike their popularity and extensive usage, they suffer from several problems. First, with increasing number of users and/or items, scalability becomes a challenge. Second, as the number of ratable items increases and number of ratings provided by each individual remains as a tiny fraction, CF systems suffer from sparsity problem. Third, many schemes fail to protect private data referred to as privacy problem. Due to such problems, accuracy and online performance become worse. In this paper, we propose two preprocessing schemes to overcome scalability and sparsity problems. First, we suggest using a novel content-based profiling of users to estimate similarities on a reduced data for better performance. Second, we propose pseudo-prediction protocol to help CF systems surmount sparsity. We finally propose to use randomization methods to preserve individual users' confidential data, where we show that our proposed preprocessing schemes can be applied to perturbed data. We analyze our schemes in terms of privacy. To investigate their effects on accuracy and performance, we perform real databased experiments. Empirical results demonstrate that our preprocessing schemes improve both performance and accuracy.","PeriodicalId":408382,"journal":{"name":"2011 IEEE Fifth International Conference on Semantic Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Fifth International Conference on Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSC.2011.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Traditional collaborative filtering (CF) systems widely employing k-nearest neighbor (kNN) algorithms mostly attempt to alleviate the contemporary problem of information overload by generating personalized predictions for items that users might like. Unlike their popularity and extensive usage, they suffer from several problems. First, with increasing number of users and/or items, scalability becomes a challenge. Second, as the number of ratable items increases and number of ratings provided by each individual remains as a tiny fraction, CF systems suffer from sparsity problem. Third, many schemes fail to protect private data referred to as privacy problem. Due to such problems, accuracy and online performance become worse. In this paper, we propose two preprocessing schemes to overcome scalability and sparsity problems. First, we suggest using a novel content-based profiling of users to estimate similarities on a reduced data for better performance. Second, we propose pseudo-prediction protocol to help CF systems surmount sparsity. We finally propose to use randomization methods to preserve individual users' confidential data, where we show that our proposed preprocessing schemes can be applied to perturbed data. We analyze our schemes in terms of privacy. To investigate their effects on accuracy and performance, we perform real databased experiments. Empirical results demonstrate that our preprocessing schemes improve both performance and accuracy.
一种改进的带隐私的基于配置文件的CF方案
传统的协同过滤(CF)系统广泛采用k-最近邻(kNN)算法,主要是试图通过对用户可能喜欢的项目生成个性化预测来缓解信息过载的当代问题。与它们的流行和广泛使用不同,它们存在一些问题。首先,随着用户和/或项目数量的增加,可伸缩性成为一个挑战。其次,由于可评分项目的数量增加,而每个人提供的评分数量仍然是很小的一部分,CF系统存在稀疏性问题。第三,许多方案未能保护私人数据被称为隐私问题。由于这些问题,准确性和在线性能变得更差。在本文中,我们提出了两种预处理方案来克服可伸缩性和稀疏性问题。首先,我们建议使用一种新颖的基于内容的用户分析来估计简化数据上的相似度,以获得更好的性能。其次,我们提出了伪预测协议,以帮助CF系统克服稀疏性。我们最后建议使用随机化方法来保存个人用户的机密数据,我们表明我们提出的预处理方案可以应用于扰动数据。我们从隐私的角度来分析我们的方案。为了研究它们对准确性和性能的影响,我们进行了真实的数据库实验。实验结果表明,我们的预处理方案提高了性能和精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信