大型竞争风险数据的可扩展算法。

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America Pub Date : 2021-01-01 Epub Date: 2020-12-11 DOI:10.1080/10618600.2020.1841650

Eric S Kawaguchi, Jenny I Shen, Marc A Suchard, Gang Li

{"title":"大型竞争风险数据的可扩展算法。","authors":"Eric S Kawaguchi, Jenny I Shen, Marc A Suchard, Gang Li","doi":"10.1080/10618600.2020.1841650","DOIUrl":null,"url":null,"abstract":"This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate ℓ 0-based iteratively reweighted ℓ 2-penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic update of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted ℓ 2-penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the computation costs of the log-pseudo likelihood and its derivatives for PSH model grow at the rate of O(n 2) with the sample size n in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to O(n). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yields > 1, 000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and a United States Renal Data System data. Supplementary materials for this article are available online.","PeriodicalId":520666,"journal":{"name":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","volume":" ","pages":"685-693"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10618600.2020.1841650","citationCount":"5","resultStr":"{\"title\":\"Scalable Algorithms for Large Competing Risks Data.\",\"authors\":\"Eric S Kawaguchi, Jenny I Shen, Marc A Suchard, Gang Li\",\"doi\":\"10.1080/10618600.2020.1841650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate ℓ 0-based iteratively reweighted ℓ 2-penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic update of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted ℓ 2-penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the computation costs of the log-pseudo likelihood and its derivatives for PSH model grow at the rate of O(n 2) with the sample size n in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to O(n). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yields > 1, 000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and a United States Renal Data System data. Supplementary materials for this article are available online.\",\"PeriodicalId\":520666,\"journal\":{\"name\":\"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America\",\"volume\":\" \",\"pages\":\"685-693\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1080/10618600.2020.1841650\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1080/10618600.2020.1841650\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2020/12/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/10618600.2020.1841650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/12/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

本文对竞争风险时间事件数据的可扩展稀疏回归进行了两个正交贡献。首先，我们在Fine-Gray(1999)比例次分布风险(PSH)模型的背景下，研究并加速了破碎自适应脊法(BAR)，这是一种基于代理r 0的迭代重加权r 2惩罚算法，在其极限下实现了稀疏性。特别地，我们推导了一种新的BAR回归算法，称为cycBAR，它使用显式阈值公式对每个坐标进行循环更新。新的cycBAR算法有效地避免了拟合多个重新加权的l2惩罚，因此比原始的BAR算法产生了令人印象深刻的加速。其次，我们解决了与拟合PSH模型相关的关键计算问题。具体而言，在目前的实现中，PSH模型的对数伪似然及其导数的计算成本随着样本容量的n以O(n2)的速率增长。我们提出了一种新的向前-向后扫描算法，将计算成本降低到O(n)。所提出的方法既适用于PSH模型的无惩罚估计，也适用于惩罚估计，并且比目前的实现显示出显著的加速。最后，结合这两种算法可以产生比原始BAR算法快1000倍以上的速度。我们提出的算法对大型竞争风险数据具有令人印象深刻的可扩展性，并用模拟和美国肾脏数据系统的数据进行了说明。本文的补充材料可在网上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable Algorithms for Large Competing Risks Data.

This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate ℓ ₀-based iteratively reweighted ℓ ₂-penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic update of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted ℓ ₂-penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the computation costs of the log-pseudo likelihood and its derivatives for PSH model grow at the rate of O(n ²) with the sample size n in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to O(n). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yields > 1, 000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and a United States Renal Data System data. Supplementary materials for this article are available online.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America

自引率

0.00%

发文量