Optimizing PPDM in asynchronous sparse data using random projection

2008 IEEE International Conference on Information Reuse and Integration Pub Date : 2008-07-13 DOI:10.1109/IRI.2008.4583066

R. R. Kumar, J. Indumathi, G. Uma

{"title":"Optimizing PPDM in asynchronous sparse data using random projection","authors":"R. R. Kumar, J. Indumathi, G. Uma","doi":"10.1109/IRI.2008.4583066","DOIUrl":null,"url":null,"abstract":"Privacy is fetching a progressively more imperative issue in several data-mining applications dealing with sensitive data especially in health care, security, financial, behavioral etc., Most of the existing techniques are managing a Secure Two-Party Computation model, where two parties, each having a private database, want to cooperatively conduct data-mining operations on the union of their data. The problem we are pinning down for Privacy Preserving Data Mining(PPDM), is how a data owner can release a version of its confidential data with guarantees that the original sensitive information cannot be re-identified while the analytic properties of the data are preserved. In this paper we work to investigate the leeway of using multiplicative random projection sparse matrices for privacy preserving data in datasets which gets incremented asynchronously over time from various sources. The data stream is asynchronous. This work proposes the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot products with high accuracy. We have also proposed a conceptual architecture for implementing the privacy preservation techniques especially the Sparse Random Projection Matrix technique in incremental data to improve the level of privacy protection. We have tested to see that the perturbed data still preserves certain statistical characteristics of the data as the original unperturbed data. At this juncture we have proposed a generic projection based sketch for incremental data stream which can be used not only for this application but also can be used for any other applications, which supports incremental data bases. We have traced the origin of PPDM, the definition of privacy preservation in data mining, and the implications of benchmark privacy doctrine in information detection and advocate a few policies for PPDM based on these privacy principles. These are vital for the development and deployment of methodological solutions. This will let vendors and developers to construct unyielding information reuse and integration (IRI) in PPDM. We pursue to capitalize on the reuse of PPDM information by crafting easy, affluent, and reusable knowledge depictions and accordingly investigates tactics for amalgamate this knowledge into heritage systems and make advances in the upcoming of PPDM.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Information Reuse and Integration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2008.4583066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Privacy is fetching a progressively more imperative issue in several data-mining applications dealing with sensitive data especially in health care, security, financial, behavioral etc., Most of the existing techniques are managing a Secure Two-Party Computation model, where two parties, each having a private database, want to cooperatively conduct data-mining operations on the union of their data. The problem we are pinning down for Privacy Preserving Data Mining(PPDM), is how a data owner can release a version of its confidential data with guarantees that the original sensitive information cannot be re-identified while the analytic properties of the data are preserved. In this paper we work to investigate the leeway of using multiplicative random projection sparse matrices for privacy preserving data in datasets which gets incremented asynchronously over time from various sources. The data stream is asynchronous. This work proposes the use of random projections with a sparse matrix to maintain a sketch of a collection of high-dimensional data-streams that are updated asynchronously. This sketch allows us to estimate L2 (Euclidean) distances and dot products with high accuracy. We have also proposed a conceptual architecture for implementing the privacy preservation techniques especially the Sparse Random Projection Matrix technique in incremental data to improve the level of privacy protection. We have tested to see that the perturbed data still preserves certain statistical characteristics of the data as the original unperturbed data. At this juncture we have proposed a generic projection based sketch for incremental data stream which can be used not only for this application but also can be used for any other applications, which supports incremental data bases. We have traced the origin of PPDM, the definition of privacy preservation in data mining, and the implications of benchmark privacy doctrine in information detection and advocate a few policies for PPDM based on these privacy principles. These are vital for the development and deployment of methodological solutions. This will let vendors and developers to construct unyielding information reuse and integration (IRI) in PPDM. We pursue to capitalize on the reuse of PPDM information by crafting easy, affluent, and reusable knowledge depictions and accordingly investigates tactics for amalgamate this knowledge into heritage systems and make advances in the upcoming of PPDM.

查看原文本刊更多论文

基于随机投影的异步稀疏数据PPDM优化

在一些处理敏感数据的数据挖掘应用程序中，隐私问题日益成为一个迫切需要解决的问题，特别是在医疗保健、安全、金融、行为等领域。大多数现有技术都是管理一个安全的两方计算模型，其中双方都有一个私有数据库，希望在他们的数据联合上合作进行数据挖掘操作。我们为隐私保护数据挖掘(PPDM)确定的问题是，数据所有者如何发布其机密数据的一个版本，同时保证原始敏感信息不会被重新识别，同时保留数据的分析属性。在本文中，我们研究了使用乘法随机投影稀疏矩阵来保护隐私的数据集的余地，这些数据集随着时间的推移从各种来源异步增加。数据流是异步的。这项工作提出使用带有稀疏矩阵的随机投影来维护异步更新的高维数据流集合的草图。这个草图允许我们以高精度估计L2(欧几里得)距离和点积。我们还提出了一种实现隐私保护技术的概念架构，特别是稀疏随机投影矩阵技术在增量数据中的应用，以提高隐私保护水平。我们进行了测试，发现扰动后的数据仍然保留了原始未扰动数据的某些统计特征。在这个关键时刻，我们提出了一个通用的基于投影的增量数据流草图，它不仅可以用于这个应用程序，也可以用于任何其他支持增量数据库的应用程序。我们追溯了PPDM的起源、数据挖掘中隐私保护的定义以及基准隐私原则在信息检测中的含义，并基于这些隐私原则提出了一些PPDM策略。这些对于方法学解决方案的开发和部署至关重要。这将使供应商和开发人员能够在PPDM中构建可靠的信息重用和集成(IRI)。我们追求通过制作简单、丰富和可重用的知识描述来利用PPDM信息的重用，并相应地研究将这些知识合并到遗产系统中的策略，并在即将到来的PPDM中取得进展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE International Conference on Information Reuse and Integration

自引率

0.00%

发文量