Lauren E McCullough, Anusila Deka, Christina Newton, Peter Briggs, Erin Gardner, Kevin C Ward, Lauren R Teras, Alpa V Patel
{"title":"Sensitivity of cancer registry linkage with missing or incomplete social security number and implications for cancer cohorts.","authors":"Lauren E McCullough, Anusila Deka, Christina Newton, Peter Briggs, Erin Gardner, Kevin C Ward, Lauren R Teras, Alpa V Patel","doi":"10.1097/EDE.0000000000001913","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Linking cancer cohort participants to state cancer registries typically relies on personally identifiable information, including Social Security Numbers (SSN), which uniquely identify individuals. However, complete SSN collection can be limited due to privacy concerns. This study evaluates the sensitivity of cancer registry linkage using partial or missing SSN and examines differences by demographic characteristics.</p><p><strong>Methods: </strong>Using data from 284,361 participants in the Cancer Prevention Study-3 (CPS-3), we conducted probabilistic linkages with cancer registries in Georgia, Ohio, and Texas using Match*Pro software. Participants were linked using combinations of personally identifiable information: complete SSN, partial SSN (last four digits), and missing SSN. We compared the sensitivity of linkages before and after manual review and stratified by sex, age, and race-ethnicity.</p><p><strong>Results: </strong>Before manual review, sensitivity for missing and partial SSN was 92.5%. Sensitivity improved to 98.6% for missing SSN and 98.8% for partial SSN after manual review. We observed no notable heterogeneity by sex, age, or race-ethnicity, with sensitivity exceeding 87% across all subgroups. Manual review substantially reduced uncertain matches, contributing to high linkage accuracy.</p><p><strong>Discussion: </strong>This study demonstrates that high sensitivity in cancer registry linkage can be achieved without complete SSN, provided other personally identifiable information (e.g., name, date of birth, longitudinal address) is available. These findings support the feasibility of accurate cancer case identification in cohorts with limited SSN data, particularly for historically marginalized populations, and underscore the importance of designing inclusive population-based cancer studies.</p>","PeriodicalId":11779,"journal":{"name":"Epidemiology","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/EDE.0000000000001913","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Linking cancer cohort participants to state cancer registries typically relies on personally identifiable information, including Social Security Numbers (SSN), which uniquely identify individuals. However, complete SSN collection can be limited due to privacy concerns. This study evaluates the sensitivity of cancer registry linkage using partial or missing SSN and examines differences by demographic characteristics.
Methods: Using data from 284,361 participants in the Cancer Prevention Study-3 (CPS-3), we conducted probabilistic linkages with cancer registries in Georgia, Ohio, and Texas using Match*Pro software. Participants were linked using combinations of personally identifiable information: complete SSN, partial SSN (last four digits), and missing SSN. We compared the sensitivity of linkages before and after manual review and stratified by sex, age, and race-ethnicity.
Results: Before manual review, sensitivity for missing and partial SSN was 92.5%. Sensitivity improved to 98.6% for missing SSN and 98.8% for partial SSN after manual review. We observed no notable heterogeneity by sex, age, or race-ethnicity, with sensitivity exceeding 87% across all subgroups. Manual review substantially reduced uncertain matches, contributing to high linkage accuracy.
Discussion: This study demonstrates that high sensitivity in cancer registry linkage can be achieved without complete SSN, provided other personally identifiable information (e.g., name, date of birth, longitudinal address) is available. These findings support the feasibility of accurate cancer case identification in cohorts with limited SSN data, particularly for historically marginalized populations, and underscore the importance of designing inclusive population-based cancer studies.
期刊介绍:
Epidemiology publishes original research from all fields of epidemiology. The journal also welcomes review articles and meta-analyses, novel hypotheses, descriptions and applications of new methods, and discussions of research theory or public health policy. We give special consideration to papers from developing countries.