Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies.

Journal of data science : JDS Pub Date : 2022-01-01 Epub Date: 2021-12-13 DOI:10.6339/21-jds1030
Eric S Kawaguchi, Sisi Li, Garrett M Weaver, Juan Pablo Lewinger
{"title":"Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies.","authors":"Eric S Kawaguchi, Sisi Li, Garrett M Weaver, Juan Pablo Lewinger","doi":"10.6339/21-jds1030","DOIUrl":null,"url":null,"abstract":"<p><p>There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of \"meta features\" to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.</p>","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"20 1","pages":"34-50"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581069/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data science : JDS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6339/21-jds1030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/12/13 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of "meta features" to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.

Abstract Image

Abstract Image

Abstract Image

在基因组研究中纳入先验信息的层次岭回归。
以注释或先前结果的形式存在着大量有关基因功能和调控的先验知识,如果将这些先验知识直接整合到单项预后或诊断研究中,可以提高预测效果。例如,在根据基因表达建立癌症生存预测模型的研究中,以往研究的效应大小或基于通路的基因分组就构成了此类先验知识。然而,这些外部信息通常只能在分析后使用,以帮助解释研究结果。我们提出了一种新的分层两级脊回归模型,它可以整合 "元特征 "形式的外部信息来预测结果。我们表明,通过将问题重铸为单层回归模型,可以使用循环坐标下降法高效拟合该模型。在基于模拟的评估中,我们发现当元特征对特征的平均值具有参考价值时,所提出的方法在预测性能方面优于标准脊回归和整合先验信息的竞争方法;而当元特征对特征的平均值不具有参考价值时,所提出的方法在性能方面没有任何损失。我们将我们的方法应用于基于甲基化特征的年代预测和基于基因表达特征的乳腺癌死亡率预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信