HB-PLS: A statistical method for identifying biological process or pathway regulators by integrating Huber loss and Berhu penalty with partial least squares regression.

Forestry research Pub Date : 2021-03-30 eCollection Date: 2021-01-01 DOI:10.48130/FR-2021-0006
Wenping Deng, Kui Zhang, Cheng He, Sanzhen Liu, Hairong Wei
{"title":"HB-PLS: A statistical method for identifying biological process or pathway regulators by integrating Huber loss and Berhu penalty with partial least squares regression.","authors":"Wenping Deng, Kui Zhang, Cheng He, Sanzhen Liu, Hairong Wei","doi":"10.48130/FR-2021-0006","DOIUrl":null,"url":null,"abstract":"<p><p>Gene expression data features high dimensionality, multicollinearity, and non-Gaussian distribution noise, posing hurdles for identification of true regulatory genes controlling a biological process or pathway. In this study, we integrated the Huber loss function and the Berhu penalty (HB) into partial least squares (PLS) framework to deal with the high dimension and multicollinearity property of gene expression data, and developed a new method called HB-PLS regression to model the relationships between regulatory genes and pathway genes. To solve the Huber-Berhu optimization problem, an accelerated proximal gradient descent algorithm with at least 10 times faster than the general convex optimization solver (CVX), was developed. Application of HB-PLS to recognize pathway regulators of lignin biosynthesis and photosynthesis in <i>Arabidopsis thaliana</i> led to the identification of many known positive pathway regulators that had previously been experimentally validated. As compared to sparse partial least squares (SPLS) regression, an efficient method for variable selection and dimension reduction in handling multicollinearity, HB-PLS has higher efficacy in identifying more positive known regulators, a much higher but slightly less sensitivity/(1-specificity) in ranking the true positive known regulators to the top of the output regulatory gene lists for the two aforementioned pathways. In addition, each method could identify some unique regulators that cannot be identified by the other methods. Our results showed that the overall performance of HB-PLS slightly exceeds that of SPLS but both methods are instrumental for identifying real pathway regulators from high-throughput gene expression data, suggesting that integration of statistics, machine leaning and convex optimization can result in a method with high efficacy and is worth further exploration.</p>","PeriodicalId":520285,"journal":{"name":"Forestry research","volume":"1 ","pages":"6"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11524267/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forestry research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48130/FR-2021-0006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Gene expression data features high dimensionality, multicollinearity, and non-Gaussian distribution noise, posing hurdles for identification of true regulatory genes controlling a biological process or pathway. In this study, we integrated the Huber loss function and the Berhu penalty (HB) into partial least squares (PLS) framework to deal with the high dimension and multicollinearity property of gene expression data, and developed a new method called HB-PLS regression to model the relationships between regulatory genes and pathway genes. To solve the Huber-Berhu optimization problem, an accelerated proximal gradient descent algorithm with at least 10 times faster than the general convex optimization solver (CVX), was developed. Application of HB-PLS to recognize pathway regulators of lignin biosynthesis and photosynthesis in Arabidopsis thaliana led to the identification of many known positive pathway regulators that had previously been experimentally validated. As compared to sparse partial least squares (SPLS) regression, an efficient method for variable selection and dimension reduction in handling multicollinearity, HB-PLS has higher efficacy in identifying more positive known regulators, a much higher but slightly less sensitivity/(1-specificity) in ranking the true positive known regulators to the top of the output regulatory gene lists for the two aforementioned pathways. In addition, each method could identify some unique regulators that cannot be identified by the other methods. Our results showed that the overall performance of HB-PLS slightly exceeds that of SPLS but both methods are instrumental for identifying real pathway regulators from high-throughput gene expression data, suggesting that integration of statistics, machine leaning and convex optimization can result in a method with high efficacy and is worth further exploration.

HB-PLS:一种通过将 Huber 损失和 Berhu 惩罚与偏最小二乘法回归相结合来识别生物过程或途径调节器的统计方法。
基因表达数据具有高维、多共线性和非高斯分布噪声等特点,给识别控制生物过程或通路的真正调控基因带来了障碍。本研究将 Huber 损失函数和 Berhu 惩罚(HB)整合到偏最小二乘法(PLS)框架中,以处理基因表达数据的高维和多共线性特性,并开发了一种名为 HB-PLS 回归的新方法来建立调控基因和通路基因之间关系的模型。为解决 Huber-Berhu 优化问题,开发了一种加速的近端梯度下降算法,其速度比一般凸优化求解器(CVX)至少快 10 倍。应用 HB-PLS 来识别拟南芥木质素生物合成和光合作用的通路调节因子,发现了许多之前已通过实验验证的已知正通路调节因子。稀疏偏最小二乘法(SPLS)是处理多重共线性时进行变量选择和降维的有效方法,与该方法相比,HB-PLS 在识别更多已知阳性调控因子方面具有更高的效率,在将真正的已知阳性调控因子排在上述两条途径输出调控基因列表的前列方面,灵敏度/(1-特异性)更高但略低。此外,每种方法都能识别出一些其他方法无法识别的独特调控因子。我们的研究结果表明,HB-PLS 的总体性能略高于 SPLS,但这两种方法都有助于从高通量基因表达数据中识别真正的通路调控因子,这表明统计、机器精益和凸优化的整合可以产生一种高效的方法,值得进一步探索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信