Mastering rare event analysis: subsample-size determination in Cox and logistic regressions.

IF 1.7 4区 数学 Q3 BIOLOGY
Biometrics Pub Date : 2025-07-03 DOI:10.1093/biomtc/ujaf110
Tal Agassi, Nir Keret, Malka Gorfine
{"title":"Mastering rare event analysis: subsample-size determination in Cox and logistic regressions.","authors":"Tal Agassi, Nir Keret, Malka Gorfine","doi":"10.1093/biomtc/ujaf110","DOIUrl":null,"url":null,"abstract":"<p><p>In the realm of contemporary data analysis, the use of massive datasets has taken on heightened significance, albeit often entailing considerable demands on computational time and memory. While a multitude of existing works offer optimal subsampling methods for conducting analyses on subsamples with minimized efficiency loss, they notably lack tools for judiciously selecting the subsample size. To bridge this gap, our work introduces tools designed for choosing the subsample size. We focus on three settings: the Cox regression model for survival data with rare events, and logistic regression for both balanced and imbalanced datasets. Additionally, we present a new optimal subsampling procedure tailored to logistic regression with imbalanced data. The efficacy of these tools and procedures is demonstrated through an extensive simulation study and meticulous analyses of two sizable datasets: survival analysis of UK Biobank colorectal cancer data with about 350 million rows and logistic regression of linked birth and infant death data with about 28 million observations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomtc/ujaf110","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

In the realm of contemporary data analysis, the use of massive datasets has taken on heightened significance, albeit often entailing considerable demands on computational time and memory. While a multitude of existing works offer optimal subsampling methods for conducting analyses on subsamples with minimized efficiency loss, they notably lack tools for judiciously selecting the subsample size. To bridge this gap, our work introduces tools designed for choosing the subsample size. We focus on three settings: the Cox regression model for survival data with rare events, and logistic regression for both balanced and imbalanced datasets. Additionally, we present a new optimal subsampling procedure tailored to logistic regression with imbalanced data. The efficacy of these tools and procedures is demonstrated through an extensive simulation study and meticulous analyses of two sizable datasets: survival analysis of UK Biobank colorectal cancer data with about 350 million rows and logistic regression of linked birth and infant death data with about 28 million observations.

掌握罕见事件分析:在Cox和逻辑回归中确定子样本大小。
在当代数据分析领域,大量数据集的使用已经变得越来越重要,尽管通常需要大量的计算时间和内存。虽然许多现有的工作提供了以最小的效率损失对子样本进行分析的最佳子抽样方法,但它们明显缺乏明智地选择子样本大小的工具。为了弥补这一差距,我们的工作引入了用于选择子样本大小的工具。我们关注三种设置:针对罕见事件的生存数据的Cox回归模型,以及针对平衡和不平衡数据集的逻辑回归。此外,我们提出了一种新的最优子抽样程序,适合于不平衡数据的逻辑回归。通过对两个大型数据集的广泛模拟研究和细致分析,证明了这些工具和程序的有效性:对英国生物银行(UK Biobank)约3.5亿行结直肠癌数据的生存分析,以及对约2800万观察值的相关出生和婴儿死亡数据的逻辑回归。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biometrics
Biometrics 生物-生物学
CiteScore
2.70
自引率
5.30%
发文量
178
审稿时长
4-8 weeks
期刊介绍: The International Biometric Society is an international society promoting the development and application of statistical and mathematical theory and methods in the biosciences, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry, and allied disciplines. The Society welcomes as members statisticians, mathematicians, biological scientists, and others devoted to interdisciplinary efforts in advancing the collection and interpretation of information in the biosciences. The Society sponsors the biennial International Biometric Conference, held in sites throughout the world; through its National Groups and Regions, it also Society sponsors regional and local meetings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信