{"title":"Detection and limitation of interval inference in statistical databases","authors":"Claus Boyens, O. Günther","doi":"10.1109/SSDBM.2004.29","DOIUrl":null,"url":null,"abstract":"Interval inference is a specific kind of statistical disclosure where a snooper collects and analyzes publicly available data to determine tight bounds on confidential numerical data. Institutions that disseminate public data include Census Bureaus and other independent organizations such as regional healthcare initiatives that provide chronic disease data that is collected from physicians, pharmacies and health maintenance organizations (HMOs). Such initiatives must ensure that the confidential values of the data providers are protected against interval inference while making sure that the released information is still useful for the prospective data users (such as medical researchers). In this paper, we consider the important case of 2-dimensional tables where the rows correspond to the data providers and the columns to confidential data categories. Although the inner cells of this table are confidential and should under no circumstances be published, marginal information about central tendency and dispersion can still be useful and worth publishing. It is the task of the data-disseminating institution to elicit these specific marginal data elements for publication such that no tight bounds on any inner table cell can be inferred. We present a new method that maximizes the usefulness of the disseminated information to the prospective data users while ensuring the confidentiality of the inner table cell values. We give a computational analysis and compare our methods to existing statistical disclosure methods.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2004.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Interval inference is a specific kind of statistical disclosure where a snooper collects and analyzes publicly available data to determine tight bounds on confidential numerical data. Institutions that disseminate public data include Census Bureaus and other independent organizations such as regional healthcare initiatives that provide chronic disease data that is collected from physicians, pharmacies and health maintenance organizations (HMOs). Such initiatives must ensure that the confidential values of the data providers are protected against interval inference while making sure that the released information is still useful for the prospective data users (such as medical researchers). In this paper, we consider the important case of 2-dimensional tables where the rows correspond to the data providers and the columns to confidential data categories. Although the inner cells of this table are confidential and should under no circumstances be published, marginal information about central tendency and dispersion can still be useful and worth publishing. It is the task of the data-disseminating institution to elicit these specific marginal data elements for publication such that no tight bounds on any inner table cell can be inferred. We present a new method that maximizes the usefulness of the disseminated information to the prospective data users while ensuring the confidentiality of the inner table cell values. We give a computational analysis and compare our methods to existing statistical disclosure methods.