Detection and limitation of interval inference in statistical databases

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI:10.1109/SSDBM.2004.29

Claus Boyens, O. Günther

{"title":"Detection and limitation of interval inference in statistical databases","authors":"Claus Boyens, O. Günther","doi":"10.1109/SSDBM.2004.29","DOIUrl":null,"url":null,"abstract":"Interval inference is a specific kind of statistical disclosure where a snooper collects and analyzes publicly available data to determine tight bounds on confidential numerical data. Institutions that disseminate public data include Census Bureaus and other independent organizations such as regional healthcare initiatives that provide chronic disease data that is collected from physicians, pharmacies and health maintenance organizations (HMOs). Such initiatives must ensure that the confidential values of the data providers are protected against interval inference while making sure that the released information is still useful for the prospective data users (such as medical researchers). In this paper, we consider the important case of 2-dimensional tables where the rows correspond to the data providers and the columns to confidential data categories. Although the inner cells of this table are confidential and should under no circumstances be published, marginal information about central tendency and dispersion can still be useful and worth publishing. It is the task of the data-disseminating institution to elicit these specific marginal data elements for publication such that no tight bounds on any inner table cell can be inferred. We present a new method that maximizes the usefulness of the disseminated information to the prospective data users while ensuring the confidentiality of the inner table cell values. We give a computational analysis and compare our methods to existing statistical disclosure methods.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2004.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Interval inference is a specific kind of statistical disclosure where a snooper collects and analyzes publicly available data to determine tight bounds on confidential numerical data. Institutions that disseminate public data include Census Bureaus and other independent organizations such as regional healthcare initiatives that provide chronic disease data that is collected from physicians, pharmacies and health maintenance organizations (HMOs). Such initiatives must ensure that the confidential values of the data providers are protected against interval inference while making sure that the released information is still useful for the prospective data users (such as medical researchers). In this paper, we consider the important case of 2-dimensional tables where the rows correspond to the data providers and the columns to confidential data categories. Although the inner cells of this table are confidential and should under no circumstances be published, marginal information about central tendency and dispersion can still be useful and worth publishing. It is the task of the data-disseminating institution to elicit these specific marginal data elements for publication such that no tight bounds on any inner table cell can be inferred. We present a new method that maximizes the usefulness of the disseminated information to the prospective data users while ensuring the confidentiality of the inner table cell values. We give a computational analysis and compare our methods to existing statistical disclosure methods.

查看原文本刊更多论文

统计数据库中区间推理的检测与限制

区间推断是一种特殊类型的统计披露，窃听者收集和分析公开可用的数据，以确定机密数字数据的严格界限。传播公共数据的机构包括人口普查局和其他独立组织，如提供从医生、药房和健康维护组织(hmo)收集的慢性病数据的区域卫生保健倡议。此类举措必须确保数据提供者的机密值不受间隔推断的影响，同时确保所发布的信息对潜在数据用户(如医学研究人员)仍然有用。在本文中，我们考虑二维表的重要情况，其中行对应于数据提供者，列对应于机密数据类别。虽然该表的内部单元格是保密的，在任何情况下都不应公布，但关于集中趋势和分散的边缘信息仍然是有用的，值得公布。数据传播机构的任务是引出这些特定的边缘数据元素进行发布，这样就不会推断出任何内部表单元的严格界限。我们提出了一种新的方法，在保证内部表单元值的机密性的同时，最大限度地提高了传播信息对潜在数据用户的有用性。我们进行了计算分析，并将我们的方法与现有的统计披露方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.

自引率

0.00%

发文量