Sensitivity of cancer registry linkage with missing or incomplete social security number and implications for cancer cohorts.

IF 4.4 2区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Lauren E McCullough, Anusila Deka, Christina Newton, Peter Briggs, Erin Gardner, Kevin C Ward, Lauren R Teras, Alpa V Patel
{"title":"Sensitivity of cancer registry linkage with missing or incomplete social security number and implications for cancer cohorts.","authors":"Lauren E McCullough, Anusila Deka, Christina Newton, Peter Briggs, Erin Gardner, Kevin C Ward, Lauren R Teras, Alpa V Patel","doi":"10.1097/EDE.0000000000001913","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Linking cancer cohort participants to state cancer registries typically relies on personally identifiable information, including Social Security Numbers (SSN), which uniquely identify individuals. However, complete SSN collection can be limited due to privacy concerns. This study evaluates the sensitivity of cancer registry linkage using partial or missing SSN and examines differences by demographic characteristics.</p><p><strong>Methods: </strong>Using data from 284,361 participants in the Cancer Prevention Study-3 (CPS-3), we conducted probabilistic linkages with cancer registries in Georgia, Ohio, and Texas using Match*Pro software. Participants were linked using combinations of personally identifiable information: complete SSN, partial SSN (last four digits), and missing SSN. We compared the sensitivity of linkages before and after manual review and stratified by sex, age, and race-ethnicity.</p><p><strong>Results: </strong>Before manual review, sensitivity for missing and partial SSN was 92.5%. Sensitivity improved to 98.6% for missing SSN and 98.8% for partial SSN after manual review. We observed no notable heterogeneity by sex, age, or race-ethnicity, with sensitivity exceeding 87% across all subgroups. Manual review substantially reduced uncertain matches, contributing to high linkage accuracy.</p><p><strong>Discussion: </strong>This study demonstrates that high sensitivity in cancer registry linkage can be achieved without complete SSN, provided other personally identifiable information (e.g., name, date of birth, longitudinal address) is available. These findings support the feasibility of accurate cancer case identification in cohorts with limited SSN data, particularly for historically marginalized populations, and underscore the importance of designing inclusive population-based cancer studies.</p>","PeriodicalId":11779,"journal":{"name":"Epidemiology","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/EDE.0000000000001913","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Linking cancer cohort participants to state cancer registries typically relies on personally identifiable information, including Social Security Numbers (SSN), which uniquely identify individuals. However, complete SSN collection can be limited due to privacy concerns. This study evaluates the sensitivity of cancer registry linkage using partial or missing SSN and examines differences by demographic characteristics.

Methods: Using data from 284,361 participants in the Cancer Prevention Study-3 (CPS-3), we conducted probabilistic linkages with cancer registries in Georgia, Ohio, and Texas using Match*Pro software. Participants were linked using combinations of personally identifiable information: complete SSN, partial SSN (last four digits), and missing SSN. We compared the sensitivity of linkages before and after manual review and stratified by sex, age, and race-ethnicity.

Results: Before manual review, sensitivity for missing and partial SSN was 92.5%. Sensitivity improved to 98.6% for missing SSN and 98.8% for partial SSN after manual review. We observed no notable heterogeneity by sex, age, or race-ethnicity, with sensitivity exceeding 87% across all subgroups. Manual review substantially reduced uncertain matches, contributing to high linkage accuracy.

Discussion: This study demonstrates that high sensitivity in cancer registry linkage can be achieved without complete SSN, provided other personally identifiable information (e.g., name, date of birth, longitudinal address) is available. These findings support the feasibility of accurate cancer case identification in cohorts with limited SSN data, particularly for historically marginalized populations, and underscore the importance of designing inclusive population-based cancer studies.

社会安全号码缺失或不完整的癌症登记联系的敏感性及其对癌症队列的影响。
背景:将癌症队列参与者与州癌症登记处联系起来通常依赖于个人身份信息,包括社会安全号码(SSN),这是唯一标识个人的信息。但是,由于隐私问题,完整的SSN收集可能会受到限制。本研究评估了使用部分或缺失社会安全号的癌症登记联系的敏感性,并检查了人口统计学特征的差异。方法:使用来自癌症预防研究-3 (CPS-3)的284,361名参与者的数据,我们使用Match*Pro软件与乔治亚州、俄亥俄州和德克萨斯州的癌症登记处进行了概率关联。参与者通过个人身份信息的组合联系在一起:完整的社会保障号、部分社会保障号(最后四位数字)和缺失的社会保障号。我们比较了人工评估前后联系的敏感性,并按性别、年龄和种族进行了分层。结果:人工复核前,对SSN缺失和部分的敏感性为92.5%。在人工检查后,对缺失SSN的敏感度提高到98.6%,对部分SSN的敏感度提高到98.8%。我们没有观察到性别、年龄或种族的显著异质性,所有亚组的敏感性均超过87%。手动审查大大减少了不确定的匹配,有助于高联动精度。讨论:本研究表明,在没有完整SSN的情况下,提供其他个人身份信息(如姓名、出生日期、纵向地址),可以实现癌症登记链接的高灵敏度。这些发现支持了在SSN数据有限的队列中准确识别癌症病例的可行性,特别是对于历史上边缘化的人群,并强调了设计包容性的基于人群的癌症研究的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Epidemiology
Epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
6.70
自引率
3.70%
发文量
177
审稿时长
6-12 weeks
期刊介绍: Epidemiology publishes original research from all fields of epidemiology. The journal also welcomes review articles and meta-analyses, novel hypotheses, descriptions and applications of new methods, and discussions of research theory or public health policy. We give special consideration to papers from developing countries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信