使用合成数据替换链接派生元素:一个案例研究。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES
Dean M Resnick, Christine S Cox, Lisa B Mirel
{"title":"使用合成数据替换链接派生元素:一个案例研究。","authors":"Dean M Resnick,&nbsp;Christine S Cox,&nbsp;Lisa B Mirel","doi":"10.1007/s10742-021-00241-z","DOIUrl":null,"url":null,"abstract":"<p><p>While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that are like those based on the actual data, this evaluation is an essential first step in assessing the potential utility of synthetic mortality data.</p>","PeriodicalId":45600,"journal":{"name":"Health Services and Outcomes Research Methodology","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10742-021-00241-z","citationCount":"2","resultStr":"{\"title\":\"Using Synthetic Data to Replace Linkage Derived Elements: A Case Study.\",\"authors\":\"Dean M Resnick,&nbsp;Christine S Cox,&nbsp;Lisa B Mirel\",\"doi\":\"10.1007/s10742-021-00241-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that are like those based on the actual data, this evaluation is an essential first step in assessing the potential utility of synthetic mortality data.</p>\",\"PeriodicalId\":45600,\"journal\":{\"name\":\"Health Services and Outcomes Research Methodology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2021-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s10742-021-00241-z\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Services and Outcomes Research Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10742-021-00241-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Services and Outcomes Research Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10742-021-00241-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 2

摘要

虽然记录链接可以扩展从调查微数据中执行的分析,但它也会带来更大的侵犯隐私的披露风险。减轻这种风险的一种方法是用合成数据元素替换通过链接添加的一些信息。本文描述了一个使用国家医院护理调查(NHCS)的案例研究,该调查在保护患者隐私的承诺下收集美国医院样本的患者记录,用于统计分析目的。国家卫生保健中心的数据与国家死亡指数(NDI)相关联,以加强死亡率信息的调查。来自NDI链接的新增信息能够进行与住院有关的生存分析,但由于死亡信息包括死亡日期和详细的死亡原因,将其与患者记录结合起来会增加患者重新识别的风险(尽管仅针对死者)。为此,测试了一种方法来开发综合数据,该数据使用生存分析模型用综合值代替生命状态和实际死亡日期,并使用分类树分析用综合死亡原因代替实际死亡原因。通过比较来自两个数据文件的生存分析参数估计值来衡量对合成数据执行的分析与对实际数据的分析结果的重复程度。由于合成数据只有在能够用于产生与基于实际数据的统计估计相似的统计估计时才有价值,因此这种评价是评估合成死亡率数据潜在效用的重要第一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using Synthetic Data to Replace Linkage Derived Elements: A Case Study.

While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that are like those based on the actual data, this evaluation is an essential first step in assessing the potential utility of synthetic mortality data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Health Services and Outcomes Research Methodology
Health Services and Outcomes Research Methodology HEALTH CARE SCIENCES & SERVICES-
CiteScore
3.40
自引率
6.70%
发文量
28
期刊介绍: The journal reflects the multidisciplinary nature of the field of health services and outcomes research. It addresses the needs of multiple, interlocking communities, including methodologists in statistics, econometrics, social and behavioral sciences; designers and analysts of health policy and health services research projects; and health care providers and policy makers who need to properly understand and evaluate the results of published research. The journal strives to enhance the level of methodologic rigor in health services and outcomes research and contributes to the development of methodologic standards in the field. In pursuing its main objective, the journal also provides a meeting ground for researchers from a number of traditional disciplines and fosters the development of new quantitative, qualitative, and mixed methods by statisticians, econometricians, health services researchers, and methodologists in other fields. Health Services and Outcomes Research Methodology publishes: Research papers on quantitative, qualitative, and mixed methods; Case Studies describing applications of quantitative and qualitative methodology in health services and outcomes research; Review Articles synthesizing and popularizing methodologic developments; Tutorials; Articles on computational issues and software reviews; Book reviews; and Notices. Special issues will be devoted to papers presented at important workshops and conferences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信