{"title":"Privacy risk from synthetic data: practical proposals","authors":"Gillian M Raab","doi":"arxiv-2409.04257","DOIUrl":null,"url":null,"abstract":"This paper proposes and compares measures of identity and attribute\ndisclosure risk for synthetic data. Data custodians can use the methods\nproposed here to inform the decision as to whether to release synthetic\nversions of confidential data. Different measures are evaluated on two data\nsets. Insight into the measures is obtained by examining the details of the\nrecords identified as posing a disclosure risk. This leads to methods to\nidentify, and possibly exclude, apparently risky records where the\nidentification or attribution would be expected by someone with background\nknowledge of the data. The methods described are available as part of the\n\\textbf{synthpop} package for \\textbf{R}.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes and compares measures of identity and attribute
disclosure risk for synthetic data. Data custodians can use the methods
proposed here to inform the decision as to whether to release synthetic
versions of confidential data. Different measures are evaluated on two data
sets. Insight into the measures is obtained by examining the details of the
records identified as posing a disclosure risk. This leads to methods to
identify, and possibly exclude, apparently risky records where the
identification or attribution would be expected by someone with background
knowledge of the data. The methods described are available as part of the
\textbf{synthpop} package for \textbf{R}.