惩罚序多分回归估计在基因表达研究中的应用

Workshop on Algorithms in Bioinformatics Pub Date : 2018-08-20 DOI:10.4230/LIPIcs.WABI.2018.17

S. Chrétien, C. Guyeux, S. Moulin

{"title":"惩罚序多分回归估计在基因表达研究中的应用","authors":"S. Chrétien, C. Guyeux, S. Moulin","doi":"10.4230/LIPIcs.WABI.2018.17","DOIUrl":null,"url":null,"abstract":"Qualitative but ordered random variables, such as severity of a pathology, are of paramount importance in biostatistics and medicine. Understanding the conditional distribution of such qualitative variables as a function of other explanatory variables can be performed using a specific regression model known as ordinal polytomous regression. Variable selection in the ordinal polytomous regression model is a computationally difficult combinatorial optimisation problem which is however crucial when practitioners need to understand which covariates are physically related to the output and which covariates are not. One easy way to circumvent the computational hardness of variable selection is to introduce a penalised maximum likelihood estimator based on some well chosen non-smooth penalisation function such as, e.g., the l_1-norm. In the case of the Gaussian linear model, the l_1-penalised least-squares estimator, also known as LASSO estimator, has attracted a lot of attention in the last decade, both from the theoretical and algorithmic viewpoints. However, even in the Gaussian linear model, accurate calibration of the relaxation parameter, i.e., the relative weight of the penalisation term in the estimation cost function is still considered a difficult problem that has to be addressed with caution. In the present paper, we apply l_1-penalisation to the ordinal polytomous regression model and compare several hyper-parameter calibration strategies. Our main contributions are: (a) a useful and simple l_1 penalised estimator for ordinal polytomous regression and a thorough description of how to apply Nesterov's accelerated gradient and the online Frank-Wolfe methods to the problem of computing this estimator, (b) a new hyper-parameter calibration method for the proposed model, based on the QUT idea of Giacobino et al. and (c) a code which can be freely used that implements the proposed estimation procedure.","PeriodicalId":329847,"journal":{"name":"Workshop on Algorithms in Bioinformatics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"l1-Penalised Ordinal Polytomous Regression Estimators with Application to Gene Expression Studies\",\"authors\":\"S. Chrétien, C. Guyeux, S. Moulin\",\"doi\":\"10.4230/LIPIcs.WABI.2018.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Qualitative but ordered random variables, such as severity of a pathology, are of paramount importance in biostatistics and medicine. Understanding the conditional distribution of such qualitative variables as a function of other explanatory variables can be performed using a specific regression model known as ordinal polytomous regression. Variable selection in the ordinal polytomous regression model is a computationally difficult combinatorial optimisation problem which is however crucial when practitioners need to understand which covariates are physically related to the output and which covariates are not. One easy way to circumvent the computational hardness of variable selection is to introduce a penalised maximum likelihood estimator based on some well chosen non-smooth penalisation function such as, e.g., the l_1-norm. In the case of the Gaussian linear model, the l_1-penalised least-squares estimator, also known as LASSO estimator, has attracted a lot of attention in the last decade, both from the theoretical and algorithmic viewpoints. However, even in the Gaussian linear model, accurate calibration of the relaxation parameter, i.e., the relative weight of the penalisation term in the estimation cost function is still considered a difficult problem that has to be addressed with caution. In the present paper, we apply l_1-penalisation to the ordinal polytomous regression model and compare several hyper-parameter calibration strategies. Our main contributions are: (a) a useful and simple l_1 penalised estimator for ordinal polytomous regression and a thorough description of how to apply Nesterov's accelerated gradient and the online Frank-Wolfe methods to the problem of computing this estimator, (b) a new hyper-parameter calibration method for the proposed model, based on the QUT idea of Giacobino et al. and (c) a code which can be freely used that implements the proposed estimation procedure.\",\"PeriodicalId\":329847,\"journal\":{\"name\":\"Workshop on Algorithms in Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Algorithms in Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.WABI.2018.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Algorithms in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.WABI.2018.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

定性但有序的随机变量，如病理的严重程度，在生物统计学和医学中是至关重要的。理解这些定性变量作为其他解释变量的函数的条件分布，可以使用一种称为有序多元回归的特定回归模型来执行。在有序多元回归模型中的变量选择是一个计算困难的组合优化问题，然而，当从业者需要了解哪些协变量与输出物理相关，哪些协变量与输出无关时，这是至关重要的。规避变量选择的计算困难的一种简单方法是引入基于一些精心选择的非光滑惩罚函数(例如l_1-norm)的惩罚极大似然估计器。在高斯线性模型的情况下，l_1惩罚最小二乘估计器，也称为LASSO估计器，在过去十年中从理论和算法的角度引起了很多关注。然而，即使在高斯线性模型中，松弛参数(即估计代价函数中惩罚项的相对权重)的准确校准仍然被认为是一个必须谨慎处理的难题。在本文中，我们将l_1惩罚应用于有序多元回归模型，并比较了几种超参数校准策略。我们的主要贡献是:(a)一个有用的和简单的有序多元回归的l_1惩罚估计量，并对如何应用Nesterov加速梯度和在线Frank-Wolfe方法来计算这个估计量的问题进行了全面的描述，(b)基于Giacobino等人的QUT思想的新模型的超参数校准方法，以及(c)一个可以自由使用的代码，实现了所提出的估计过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

l1-Penalised Ordinal Polytomous Regression Estimators with Application to Gene Expression Studies

Qualitative but ordered random variables, such as severity of a pathology, are of paramount importance in biostatistics and medicine. Understanding the conditional distribution of such qualitative variables as a function of other explanatory variables can be performed using a specific regression model known as ordinal polytomous regression. Variable selection in the ordinal polytomous regression model is a computationally difficult combinatorial optimisation problem which is however crucial when practitioners need to understand which covariates are physically related to the output and which covariates are not. One easy way to circumvent the computational hardness of variable selection is to introduce a penalised maximum likelihood estimator based on some well chosen non-smooth penalisation function such as, e.g., the l_1-norm. In the case of the Gaussian linear model, the l_1-penalised least-squares estimator, also known as LASSO estimator, has attracted a lot of attention in the last decade, both from the theoretical and algorithmic viewpoints. However, even in the Gaussian linear model, accurate calibration of the relaxation parameter, i.e., the relative weight of the penalisation term in the estimation cost function is still considered a difficult problem that has to be addressed with caution. In the present paper, we apply l_1-penalisation to the ordinal polytomous regression model and compare several hyper-parameter calibration strategies. Our main contributions are: (a) a useful and simple l_1 penalised estimator for ordinal polytomous regression and a thorough description of how to apply Nesterov's accelerated gradient and the online Frank-Wolfe methods to the problem of computing this estimator, (b) a new hyper-parameter calibration method for the proposed model, based on the QUT idea of Giacobino et al. and (c) a code which can be freely used that implements the proposed estimation procedure.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Algorithms in Bioinformatics

自引率

0.00%

发文量