Effects of Data Anonymization by Cell Suppression on Descriptive Statistics and Predictive Modeling Performance

Proceedings. AMIA Symposium Pub Date : 2002-11-01 DOI:10.1197/jamia.M1241

L. Ohno-Machado, S. Vinterbo, S. Dreiseitl

引用次数: 19

Abstract

Protecting individual data in disclosed databases is essential. Data anonymization strategies can produce table ambiguation by suppression of selected cells. Using table ambiguation, different degrees of anonymization can be achieved, depending on the number of individuals that a particular case must become indistinguishable from. This number defines the level of anonymization. Anonymization by cell suppression does not necessarily prevent inferences from being made from the disclosed data. Preventing inferences may be important to preserve confidentiality. We show that anonymized data sets can preserve descriptive characteristics of the data, but might also be used for making inferences on particular individuals, which is a feature that may not be desirable. The degradation of predictive performance is directly proportional to the degree of anonymity. As an example, we report the effect of anonymization on the predictive performance of a model constructed to estimate the probability of disease given clinical findings.

查看原文本刊更多论文

细胞抑制数据匿名化对描述性统计和预测建模性能的影响

保护公开数据库中的个人数据至关重要。数据匿名化策略可以通过抑制选定的单元格来产生表歧义。使用表歧义，可以实现不同程度的匿名化，这取决于特定情况必须与之无法区分的个体数量。这个数字定义了匿名化的级别。细胞抑制的匿名化并不一定能阻止从公开的数据中做出推断。防止推论对于保护机密性可能很重要。我们表明，匿名数据集可以保留数据的描述性特征，但也可能用于对特定个体进行推断，这是一个可能不可取的特征。预测性能的下降与匿名程度成正比。作为一个例子，我们报告了匿名化对模型预测性能的影响，该模型用于估计给定临床结果的疾病概率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. AMIA Symposium

自引率

0.00%

发文量