用于癌症基因组学研究中稳健变量选择的尖峰-扁平量级 LASSO

IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Yuwen Liu, Jie Ren, Shuangge Ma, Cen Wu
{"title":"用于癌症基因组学研究中稳健变量选择的尖峰-扁平量级 LASSO","authors":"Yuwen Liu, Jie Ren, Shuangge Ma, Cen Wu","doi":"10.1002/sim.10196","DOIUrl":null,"url":null,"abstract":"Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy‐tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the nonrobust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the remarkable features of the quantile LASSO and fully Bayesian regularized quantile regression while overcoming their disadvantage in the analysis of high‐dimensional genomics data, we propose the spike‐and‐slab quantile LASSO through a fully Bayesian spike‐and‐slab formulation under the robust likelihood by adopting the asymmetric Laplace distribution (ALD). The proposed robust method has inherited the prominent properties of selective shrinkage and self‐adaptivity to the sparsity pattern from the spike‐and‐slab LASSO (Roc̆ková and George, <jats:italic>J Am Stat Associat</jats:italic>, 2018, 113(521): 431–444). Furthermore, the spike‐and‐slab quantile LASSO has a computational advantage to locate the posterior modes via soft‐thresholding rule guided Expectation‐Maximization (EM) steps in the coordinate descent framework, a phenomenon rarely observed for robust regularization with nondifferentiable loss functions. We have conducted comprehensive simulation studies with a variety of heavy‐tailed errors in both homogeneous and heterogeneous model settings to demonstrate the superiority of the spike‐and‐slab quantile LASSO over its competing methods. The advantage of the proposed method has been further demonstrated in case studies of the lung adenocarcinomas (LUAD) and skin cutaneous melanoma (SKCM) data from The Cancer Genome Atlas (TCGA).","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The spike‐and‐slab quantile LASSO for robust variable selection in cancer genomics studies\",\"authors\":\"Yuwen Liu, Jie Ren, Shuangge Ma, Cen Wu\",\"doi\":\"10.1002/sim.10196\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy‐tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the nonrobust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the remarkable features of the quantile LASSO and fully Bayesian regularized quantile regression while overcoming their disadvantage in the analysis of high‐dimensional genomics data, we propose the spike‐and‐slab quantile LASSO through a fully Bayesian spike‐and‐slab formulation under the robust likelihood by adopting the asymmetric Laplace distribution (ALD). The proposed robust method has inherited the prominent properties of selective shrinkage and self‐adaptivity to the sparsity pattern from the spike‐and‐slab LASSO (Roc̆ková and George, <jats:italic>J Am Stat Associat</jats:italic>, 2018, 113(521): 431–444). Furthermore, the spike‐and‐slab quantile LASSO has a computational advantage to locate the posterior modes via soft‐thresholding rule guided Expectation‐Maximization (EM) steps in the coordinate descent framework, a phenomenon rarely observed for robust regularization with nondifferentiable loss functions. We have conducted comprehensive simulation studies with a variety of heavy‐tailed errors in both homogeneous and heterogeneous model settings to demonstrate the superiority of the spike‐and‐slab quantile LASSO over its competing methods. The advantage of the proposed method has been further demonstrated in case studies of the lung adenocarcinomas (LUAD) and skin cutaneous melanoma (SKCM) data from The Cancer Genome Atlas (TCGA).\",\"PeriodicalId\":21879,\"journal\":{\"name\":\"Statistics in Medicine\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/sim.10196\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.10196","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

在癌症基因组学研究中,人们普遍观察到数据的不规则性,表现为复杂性状中的异常值和重尾分布。在过去的十年中,稳健变量选择方法作为非稳健变量选择方法的有力替代品出现,用于识别与异质性疾病性状相关的重要基因并建立卓越的预测模型。在本研究中,为了保持量化 LASSO 和全贝叶斯正则化量化回归的显著特点,同时克服它们在高维基因组学数据分析中的缺点,我们通过采用非对称拉普拉斯分布 (ALD),在鲁棒似然下的全贝叶斯尖顶和板块量化 LASSO,提出了尖顶和板块量化 LASSO。所提出的稳健方法继承了穗-片 LASSO 的选择性收缩和自适应稀疏性模式的突出特性(Roc̆ková and George, J Am Stat Associat, 2018, 113(521):431-444).此外,在坐标下降框架中通过软阈值规则引导的期望最大化(EM)步骤定位后验模式,尖峰-斜面量化 LASSO 具有计算优势,这种现象在具有无差异损失函数的鲁棒正则化中很少见。我们对同质和异质模型设置中的各种重尾误差进行了全面的模拟研究,以证明尖峰-板状量子化 LASSO 比其竞争方法更具优势。通过对癌症基因组图谱(TCGA)中的肺腺癌(LUAD)和皮肤黑色素瘤(SKCM)数据进行案例研究,进一步证明了所提出方法的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The spike‐and‐slab quantile LASSO for robust variable selection in cancer genomics studies
Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy‐tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the nonrobust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the remarkable features of the quantile LASSO and fully Bayesian regularized quantile regression while overcoming their disadvantage in the analysis of high‐dimensional genomics data, we propose the spike‐and‐slab quantile LASSO through a fully Bayesian spike‐and‐slab formulation under the robust likelihood by adopting the asymmetric Laplace distribution (ALD). The proposed robust method has inherited the prominent properties of selective shrinkage and self‐adaptivity to the sparsity pattern from the spike‐and‐slab LASSO (Roc̆ková and George, J Am Stat Associat, 2018, 113(521): 431–444). Furthermore, the spike‐and‐slab quantile LASSO has a computational advantage to locate the posterior modes via soft‐thresholding rule guided Expectation‐Maximization (EM) steps in the coordinate descent framework, a phenomenon rarely observed for robust regularization with nondifferentiable loss functions. We have conducted comprehensive simulation studies with a variety of heavy‐tailed errors in both homogeneous and heterogeneous model settings to demonstrate the superiority of the spike‐and‐slab quantile LASSO over its competing methods. The advantage of the proposed method has been further demonstrated in case studies of the lung adenocarcinomas (LUAD) and skin cutaneous melanoma (SKCM) data from The Cancer Genome Atlas (TCGA).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Statistics in Medicine
Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生
CiteScore
3.40
自引率
10.00%
发文量
334
审稿时长
2-4 weeks
期刊介绍: The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信