Eric S Kawaguchi, Jenny I Shen, Marc A Suchard, Gang Li
{"title":"大型竞争风险数据的可扩展算法。","authors":"Eric S Kawaguchi, Jenny I Shen, Marc A Suchard, Gang Li","doi":"10.1080/10618600.2020.1841650","DOIUrl":null,"url":null,"abstract":"<p><p>This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate <i>ℓ</i> <sub>0</sub>-based iteratively reweighted <i>ℓ</i> <sub>2</sub>-penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic update of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted <i>ℓ</i> <sub>2</sub>-penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the computation costs of the log-pseudo likelihood and its derivatives for PSH model grow at the rate of <i>O</i>(<i>n</i> <sup>2</sup>) with the sample size <i>n</i> in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to <i>O</i>(<i>n</i>). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yields > 1, 000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and a United States Renal Data System data. Supplementary materials for this article are available online.</p>","PeriodicalId":520666,"journal":{"name":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","volume":" ","pages":"685-693"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10618600.2020.1841650","citationCount":"5","resultStr":"{\"title\":\"Scalable Algorithms for Large Competing Risks Data.\",\"authors\":\"Eric S Kawaguchi, Jenny I Shen, Marc A Suchard, Gang Li\",\"doi\":\"10.1080/10618600.2020.1841650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate <i>ℓ</i> <sub>0</sub>-based iteratively reweighted <i>ℓ</i> <sub>2</sub>-penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic update of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted <i>ℓ</i> <sub>2</sub>-penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the computation costs of the log-pseudo likelihood and its derivatives for PSH model grow at the rate of <i>O</i>(<i>n</i> <sup>2</sup>) with the sample size <i>n</i> in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to <i>O</i>(<i>n</i>). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yields > 1, 000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and a United States Renal Data System data. Supplementary materials for this article are available online.</p>\",\"PeriodicalId\":520666,\"journal\":{\"name\":\"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America\",\"volume\":\" \",\"pages\":\"685-693\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1080/10618600.2020.1841650\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1080/10618600.2020.1841650\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2020/12/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/10618600.2020.1841650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/12/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
Scalable Algorithms for Large Competing Risks Data.
This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate ℓ0-based iteratively reweighted ℓ2-penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic update of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted ℓ2-penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the computation costs of the log-pseudo likelihood and its derivatives for PSH model grow at the rate of O(n2) with the sample size n in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to O(n). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yields > 1, 000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and a United States Renal Data System data. Supplementary materials for this article are available online.