一种保护隐私的数据收集新技术。

Q2 Mathematics

Journal of Privacy and Confidentiality Pub Date : 2016-01-01 Epub Date: 2018-02-02 DOI:10.29012/jpc.v7i3.408

Samuel S Wu, Shigang Chen, Deborah Burr, Long Zhang

{"title":"一种保护隐私的数据收集新技术。","authors":"Samuel S Wu, Shigang Chen, Deborah Burr, Long Zhang","doi":"10.29012/jpc.v7i3.408","DOIUrl":null,"url":null,"abstract":"A major obstacle that hinders medical and social research is the lack of reliable data due to people's reluctance to reveal private information to strangers. Fortunately, statistical inference always targets a well-defined population rather than a particular individual subject and, in many current applications, data can be collected using a web-based system or other mobile devices. These two characteristics enable us to develop a data collection method, called triple matrix-masking (TM 2 ), which offers strong privacy protection with an immediate matrix transformation so that even the researchers cannot see the data, and then further uses matrix transformations to guarantee that the data will still be analyzable by standard statistical methods. The entities involved in the proposed process are a masking service provider who receives the initially masked data and then applies another mask, and the data collectors who partially decrypt the now doubly masked data and then apply a third mask before releasing the data to the public. A critical feature of the method is that the keys to generate the matrices are held separately. This ensures that nobody sees the actual data, but because of the specially designed transformations, statistical inference on parameters of interest can be conducted with the same results as if the original data were used. Hence the TM2 method hides sensitive data with no efficiency loss for statistical inference of binary and normal data, which improves over Warner's randomized response technique. In addition, we add several features to the proposed procedure: an error checking mechanism is built into the data collection process in order to make sure that the masked data used for analysis are an appropriate transformation of the original data; and a partial masking technique is introduced to grant data users access to non-sensitive personal information while sensitive information remains hidden.","PeriodicalId":52360,"journal":{"name":"Journal of Privacy and Confidentiality","volume":"7 3","pages":"99-129"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589820/pdf/nihms-1035142.pdf","citationCount":"11","resultStr":"{\"title\":\"A New Data Collection Technique for Preserving Privacy.\",\"authors\":\"Samuel S Wu, Shigang Chen, Deborah Burr, Long Zhang\",\"doi\":\"10.29012/jpc.v7i3.408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A major obstacle that hinders medical and social research is the lack of reliable data due to people's reluctance to reveal private information to strangers. Fortunately, statistical inference always targets a well-defined population rather than a particular individual subject and, in many current applications, data can be collected using a web-based system or other mobile devices. These two characteristics enable us to develop a data collection method, called triple matrix-masking (TM 2 ), which offers strong privacy protection with an immediate matrix transformation so that even the researchers cannot see the data, and then further uses matrix transformations to guarantee that the data will still be analyzable by standard statistical methods. The entities involved in the proposed process are a masking service provider who receives the initially masked data and then applies another mask, and the data collectors who partially decrypt the now doubly masked data and then apply a third mask before releasing the data to the public. A critical feature of the method is that the keys to generate the matrices are held separately. This ensures that nobody sees the actual data, but because of the specially designed transformations, statistical inference on parameters of interest can be conducted with the same results as if the original data were used. Hence the TM2 method hides sensitive data with no efficiency loss for statistical inference of binary and normal data, which improves over Warner's randomized response technique. In addition, we add several features to the proposed procedure: an error checking mechanism is built into the data collection process in order to make sure that the masked data used for analysis are an appropriate transformation of the original data; and a partial masking technique is introduced to grant data users access to non-sensitive personal information while sensitive information remains hidden.\",\"PeriodicalId\":52360,\"journal\":{\"name\":\"Journal of Privacy and Confidentiality\",\"volume\":\"7 3\",\"pages\":\"99-129\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6589820/pdf/nihms-1035142.pdf\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Privacy and Confidentiality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29012/jpc.v7i3.408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2018/2/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Privacy and Confidentiality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29012/jpc.v7i3.408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2018/2/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 11

摘要

阻碍医学和社会研究的一个主要障碍是缺乏可靠的数据，因为人们不愿意向陌生人透露私人信息。幸运的是，统计推断总是针对定义良好的人群，而不是特定的个体主题，并且在许多当前的应用程序中，可以使用基于web的系统或其他移动设备收集数据。这两个特点使我们开发了一种数据收集方法，称为三重矩阵屏蔽(TM 2)，它通过即时矩阵变换提供强大的隐私保护，即使研究人员也看不到数据，然后进一步使用矩阵变换来保证数据仍然可以通过标准统计方法进行分析。在提议的过程中涉及的实体是一个屏蔽服务提供者，他接收最初被屏蔽的数据，然后应用另一个掩码，以及数据收集器，他们部分地解密现在双重被屏蔽的数据，然后在向公众发布数据之前应用第三个掩码。该方法的一个关键特征是生成矩阵的键是分开保存的。这可以确保没有人看到实际数据，但是由于特殊设计的转换，可以对感兴趣的参数进行统计推断，结果与使用原始数据相同。因此，TM2方法在对二值和正态数据进行统计推断时，在没有效率损失的情况下隐藏了敏感数据，比Warner的随机响应技术有所改进。此外，我们为所提出的过程增加了几个特征:在数据收集过程中内置错误检查机制，以确保用于分析的屏蔽数据是原始数据的适当转换;引入部分屏蔽技术，允许数据用户访问非敏感的个人信息，而敏感信息保持隐藏。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A New Data Collection Technique for Preserving Privacy.

查看原文本刊更多论文

A New Data Collection Technique for Preserving Privacy.

A major obstacle that hinders medical and social research is the lack of reliable data due to people's reluctance to reveal private information to strangers. Fortunately, statistical inference always targets a well-defined population rather than a particular individual subject and, in many current applications, data can be collected using a web-based system or other mobile devices. These two characteristics enable us to develop a data collection method, called triple matrix-masking (TM ² ), which offers strong privacy protection with an immediate matrix transformation so that even the researchers cannot see the data, and then further uses matrix transformations to guarantee that the data will still be analyzable by standard statistical methods. The entities involved in the proposed process are a masking service provider who receives the initially masked data and then applies another mask, and the data collectors who partially decrypt the now doubly masked data and then apply a third mask before releasing the data to the public. A critical feature of the method is that the keys to generate the matrices are held separately. This ensures that nobody sees the actual data, but because of the specially designed transformations, statistical inference on parameters of interest can be conducted with the same results as if the original data were used. Hence the TM² method hides sensitive data with no efficiency loss for statistical inference of binary and normal data, which improves over Warner's randomized response technique. In addition, we add several features to the proposed procedure: an error checking mechanism is built into the data collection process in order to make sure that the masked data used for analysis are an appropriate transformation of the original data; and a partial masking technique is introduced to grant data users access to non-sensitive personal information while sensitive information remains hidden.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Privacy and Confidentiality Computer Science-Computer Science (miscellaneous)

CiteScore

3.10

自引率

0.00%

发文量

审稿时长

24 weeks