Data-integration with pseudoweights and survey-calibration: application to developing US-representative lung cancer risk models for use in screening.

IF 1.5 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS
Lingxiao Wang, Yan Li, Barry I Graubard, Hormuzd A Katki
{"title":"Data-integration with pseudoweights and survey-calibration: application to developing US-representative lung cancer risk models for use in screening.","authors":"Lingxiao Wang, Yan Li, Barry I Graubard, Hormuzd A Katki","doi":"10.1093/jrsssa/qnae059","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate cancer risk estimation is crucial to clinical decision-making, such as identifying high-risk people for screening. However, most existing cancer risk models incorporate data from epidemiologic studies, which usually cannot represent the target population. While population-based health surveys are ideal for making inference to the target population, they typically do not collect time-to-cancer incidence data. Instead, time-to-cancer specific mortality is often readily available on surveys via linkage to vital statistics. We develop calibrated pseudoweighting methods that integrate individual-level data from a cohort and a survey, and summary statistics of cancer incidence from national cancer registries. By leveraging individual-level cancer mortality data in the survey, the proposed methods impute time-to-cancer incidence for survey sample individuals and use survey calibration with auxiliary variables of influence functions generated from Cox regression to improve robustness and efficiency of the inverse-propensity pseudoweighting method in estimating pure risks. We develop a lung cancer incidence pure risk model from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial using our proposed methods by integrating data from the National Health Interview Survey and cancer registries.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"119-139"},"PeriodicalIF":1.5000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728053/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society Series A-Statistics in Society","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jrsssa/qnae059","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate cancer risk estimation is crucial to clinical decision-making, such as identifying high-risk people for screening. However, most existing cancer risk models incorporate data from epidemiologic studies, which usually cannot represent the target population. While population-based health surveys are ideal for making inference to the target population, they typically do not collect time-to-cancer incidence data. Instead, time-to-cancer specific mortality is often readily available on surveys via linkage to vital statistics. We develop calibrated pseudoweighting methods that integrate individual-level data from a cohort and a survey, and summary statistics of cancer incidence from national cancer registries. By leveraging individual-level cancer mortality data in the survey, the proposed methods impute time-to-cancer incidence for survey sample individuals and use survey calibration with auxiliary variables of influence functions generated from Cox regression to improve robustness and efficiency of the inverse-propensity pseudoweighting method in estimating pure risks. We develop a lung cancer incidence pure risk model from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial using our proposed methods by integrating data from the National Health Interview Survey and cancer registries.

数据整合与伪权和调查校准:应用于开发具有美国代表性的肺癌风险模型用于筛查。
准确的癌症风险评估对临床决策至关重要,例如确定需要筛查的高危人群。然而,大多数现有的癌症风险模型纳入了来自流行病学研究的数据,这些数据通常不能代表目标人群。虽然以人口为基础的健康调查对于推断目标人群是理想的,但它们通常不收集癌症发病时间的数据。相反,通过与生命统计数据的联系,通常可以很容易地从调查中获得癌症特定时间的死亡率。我们开发了校准的伪加权方法,整合了来自队列和调查的个人水平数据,以及来自国家癌症登记处的癌症发病率汇总统计数据。通过利用调查中个体水平的癌症死亡率数据,提出的方法为调查样本个体估算癌症发病时间,并使用Cox回归产生的影响函数辅助变量对调查进行校准,以提高反倾向伪加权法在估计纯风险方面的稳健性和效率。我们通过整合来自全国健康访谈调查和癌症登记处的数据,采用我们提出的方法,从前列腺癌、肺癌、结直肠癌和卵巢癌筛查试验中建立了肺癌发病率纯风险模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.90
自引率
5.00%
发文量
136
审稿时长
>12 weeks
期刊介绍: Series A (Statistics in Society) publishes high quality papers that demonstrate how statistical thinking, design and analyses play a vital role in all walks of life and benefit society in general. There is no restriction on subject-matter: any interesting, topical and revelatory applications of statistics are welcome. For example, important applications of statistical and related data science methodology in medicine, business and commerce, industry, economics and finance, education and teaching, physical and biomedical sciences, the environment, the law, government and politics, demography, psychology, sociology and sport all fall within the journal''s remit. The journal is therefore aimed at a wide statistical audience and at professional statisticians in particular. Its emphasis is on well-written and clearly reasoned quantitative approaches to problems in the real world rather than the exposition of technical detail. Thus, although the methodological basis of papers must be sound and adequately explained, methodology per se should not be the main focus of a Series A paper. Of particular interest are papers on topical or contentious statistical issues, papers which give reviews or exposés of current statistical concerns and papers which demonstrate how appropriate statistical thinking has contributed to our understanding of important substantive questions. Historical, professional and biographical contributions are also welcome, as are discussions of methods of data collection and of ethical issues, provided that all such papers have substantial statistical relevance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信