测量数据隐私保护和机器学习

2018 7th International Conference On Software Process Improvement (CIMPS) Pub Date : 2018-10-01 DOI:10.1109/CIMPS.2018.8625613

Luis Gustavo Esquivel-Quirós, E. G. Barrantes, Fernando Esponda Darlington

{"title":"测量数据隐私保护和机器学习","authors":"Luis Gustavo Esquivel-Quirós, E. G. Barrantes, Fernando Esponda Darlington","doi":"10.1109/CIMPS.2018.8625613","DOIUrl":null,"url":null,"abstract":"The increasing publication of large amounts of data, theoretically anonymous, can lead to a number of attacks on the privacy of people. The publication of sensitive data without exposing the data owners is generally not part of the software developers concerns. The regulations for the data privacy-preserving create an appropriate scenario to focus on privacy from the perspective of the use or data exploration that takes place in an organization. The increasing number of sanctions for privacy violations motivates the systematic comparison of three known machine learning algorithms in order to measure the usefulness of the data privacy preserving. The scope of the evaluation is extended by comparing them with a known privacy preservation metric. Different parameter scenarios and privacy levels are used. The use of publicly available implementations, the presentation of the methodology, explanation of the experiments and the analysis allow providing a framework of work on the problem of the preservation of privacy. Problems are shown in the measurement of the usefulness of the data and its relationship with the privacy preserving. The findings motivate the need to create optimized metrics on the privacy preferences of the owners of the data since the risks of predicting sensitive attributes by means of machine learning techniques are not usually eliminated. In addition, it is shown that there may be a hundred percent, but it cannot be measured. As well as ensuring adequate performance of machine learning models that are of interest to the organization that data publisher.","PeriodicalId":159915,"journal":{"name":"2018 7th International Conference On Software Process Improvement (CIMPS)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Measuring data privacy preserving and machine learning\",\"authors\":\"Luis Gustavo Esquivel-Quirós, E. G. Barrantes, Fernando Esponda Darlington\",\"doi\":\"10.1109/CIMPS.2018.8625613\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing publication of large amounts of data, theoretically anonymous, can lead to a number of attacks on the privacy of people. The publication of sensitive data without exposing the data owners is generally not part of the software developers concerns. The regulations for the data privacy-preserving create an appropriate scenario to focus on privacy from the perspective of the use or data exploration that takes place in an organization. The increasing number of sanctions for privacy violations motivates the systematic comparison of three known machine learning algorithms in order to measure the usefulness of the data privacy preserving. The scope of the evaluation is extended by comparing them with a known privacy preservation metric. Different parameter scenarios and privacy levels are used. The use of publicly available implementations, the presentation of the methodology, explanation of the experiments and the analysis allow providing a framework of work on the problem of the preservation of privacy. Problems are shown in the measurement of the usefulness of the data and its relationship with the privacy preserving. The findings motivate the need to create optimized metrics on the privacy preferences of the owners of the data since the risks of predicting sensitive attributes by means of machine learning techniques are not usually eliminated. In addition, it is shown that there may be a hundred percent, but it cannot be measured. As well as ensuring adequate performance of machine learning models that are of interest to the organization that data publisher.\",\"PeriodicalId\":159915,\"journal\":{\"name\":\"2018 7th International Conference On Software Process Improvement (CIMPS)\",\"volume\":\"107 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 7th International Conference On Software Process Improvement (CIMPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIMPS.2018.8625613\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th International Conference On Software Process Improvement (CIMPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIMPS.2018.8625613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

大量数据的日益公开，理论上是匿名的，可能导致对人们隐私的一系列攻击。在不暴露数据所有者的情况下发布敏感数据通常不是软件开发人员关心的问题。数据隐私保护规则创建了一个适当的场景，从组织中发生的数据使用或探索的角度关注隐私。越来越多的对侵犯隐私的制裁促使人们对三种已知的机器学习算法进行系统比较，以衡量数据隐私保护的有用性。通过将它们与已知的隐私保护度量进行比较，扩展了评估的范围。使用不同的参数场景和隐私级别。公开实现的使用、方法的呈现、实验的解释和分析允许提供一个关于保护隐私问题的工作框架。在测量数据有用性及其与隐私保护的关系方面存在问题。这些发现激发了对数据所有者隐私偏好创建优化指标的需求，因为通过机器学习技术预测敏感属性的风险通常无法消除。此外，它表明，可能有百分之百，但它无法测量。以及确保数据发布者组织感兴趣的机器学习模型的足够性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Measuring data privacy preserving and machine learning

The increasing publication of large amounts of data, theoretically anonymous, can lead to a number of attacks on the privacy of people. The publication of sensitive data without exposing the data owners is generally not part of the software developers concerns. The regulations for the data privacy-preserving create an appropriate scenario to focus on privacy from the perspective of the use or data exploration that takes place in an organization. The increasing number of sanctions for privacy violations motivates the systematic comparison of three known machine learning algorithms in order to measure the usefulness of the data privacy preserving. The scope of the evaluation is extended by comparing them with a known privacy preservation metric. Different parameter scenarios and privacy levels are used. The use of publicly available implementations, the presentation of the methodology, explanation of the experiments and the analysis allow providing a framework of work on the problem of the preservation of privacy. Problems are shown in the measurement of the usefulness of the data and its relationship with the privacy preserving. The findings motivate the need to create optimized metrics on the privacy preferences of the owners of the data since the risks of predicting sensitive attributes by means of machine learning techniques are not usually eliminated. In addition, it is shown that there may be a hundred percent, but it cannot be measured. As well as ensuring adequate performance of machine learning models that are of interest to the organization that data publisher.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 7th International Conference On Software Process Improvement (CIMPS)

自引率

0.00%

发文量