Effects of Data Anonymization by Cell Suppression on Descriptive Statistics and Predictive Modeling Performance

L. Ohno-Machado, S. Vinterbo, S. Dreiseitl
{"title":"Effects of Data Anonymization by Cell Suppression on Descriptive Statistics and Predictive Modeling Performance","authors":"L. Ohno-Machado, S. Vinterbo, S. Dreiseitl","doi":"10.1197/jamia.M1241","DOIUrl":null,"url":null,"abstract":"Protecting individual data in disclosed databases is essential. Data anonymization strategies can produce table ambiguation by suppression of selected cells. Using table ambiguation, different degrees of anonymization can be achieved, depending on the number of individuals that a particular case must become indistinguishable from. This number defines the level of anonymization. Anonymization by cell suppression does not necessarily prevent inferences from being made from the disclosed data. Preventing inferences may be important to preserve confidentiality. We show that anonymized data sets can preserve descriptive characteristics of the data, but might also be used for making inferences on particular individuals, which is a feature that may not be desirable. The degradation of predictive performance is directly proportional to the degree of anonymity. As an example, we report the effect of anonymization on the predictive performance of a model constructed to estimate the probability of disease given clinical findings.","PeriodicalId":79712,"journal":{"name":"Proceedings. AMIA Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2002-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1197/jamia.M1241","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1197/jamia.M1241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Protecting individual data in disclosed databases is essential. Data anonymization strategies can produce table ambiguation by suppression of selected cells. Using table ambiguation, different degrees of anonymization can be achieved, depending on the number of individuals that a particular case must become indistinguishable from. This number defines the level of anonymization. Anonymization by cell suppression does not necessarily prevent inferences from being made from the disclosed data. Preventing inferences may be important to preserve confidentiality. We show that anonymized data sets can preserve descriptive characteristics of the data, but might also be used for making inferences on particular individuals, which is a feature that may not be desirable. The degradation of predictive performance is directly proportional to the degree of anonymity. As an example, we report the effect of anonymization on the predictive performance of a model constructed to estimate the probability of disease given clinical findings.
细胞抑制数据匿名化对描述性统计和预测建模性能的影响
保护公开数据库中的个人数据至关重要。数据匿名化策略可以通过抑制选定的单元格来产生表歧义。使用表歧义,可以实现不同程度的匿名化,这取决于特定情况必须与之无法区分的个体数量。这个数字定义了匿名化的级别。细胞抑制的匿名化并不一定能阻止从公开的数据中做出推断。防止推论对于保护机密性可能很重要。我们表明,匿名数据集可以保留数据的描述性特征,但也可能用于对特定个体进行推断,这是一个可能不可取的特征。预测性能的下降与匿名程度成正比。作为一个例子,我们报告了匿名化对模型预测性能的影响,该模型用于估计给定临床结果的疾病概率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信