A supervised approach for predicting patient survival with gene expression data.

Karthik Devarajan, Yan Zhou, Neeraj Chachra, Nader Ebrahimi
{"title":"A supervised approach for predicting patient survival with gene expression data.","authors":"Karthik Devarajan,&nbsp;Yan Zhou,&nbsp;Neeraj Chachra,&nbsp;Nader Ebrahimi","doi":"10.1109/BIBE.2010.14","DOIUrl":null,"url":null,"abstract":"<p><p>Rapid development in genomics in recent years has allowed the simultaneous measurement of the expression levels of thousands of genes using DNA microarrays. This has offered tremendous potential for growth in our understanding of the pathophysiology of many diseases. When microarray studies also contain information about an outcome variable such as time to an event or death, one of the goals of an investigator is to understand how the expression levels of genes (covariates) relate to the time-to-event (referred to as survival time) in the course of a disease.In this article, we consider the case where the number of covariates, p, exceeds the number of observations, N, a setting typical of microarray gene expression data. For a given vector of responses representing survival times of N subjects and the corresponding p × N gene expression matrix, we examine the problem of predicting the survival probability when N ≪ p. This is an ill-conditioned problem further compounded by the presence of possibly censored survival times. We propose a model that combines the partial least squares approach for dimensionality reduction with the accelerated failure time model, a widely used log-linear model for linking censored survival time to covariates. We develop parametric methods to account for censoring as well as for predicting patient survival probabilities. We illustrate the applicability of our methods using cancer microarray data and explore the biological relevance of our results using pathway analysis. Finally, we evaluate the performance of our methods using extensive simulation studies.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBE.2010.14","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2010.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Rapid development in genomics in recent years has allowed the simultaneous measurement of the expression levels of thousands of genes using DNA microarrays. This has offered tremendous potential for growth in our understanding of the pathophysiology of many diseases. When microarray studies also contain information about an outcome variable such as time to an event or death, one of the goals of an investigator is to understand how the expression levels of genes (covariates) relate to the time-to-event (referred to as survival time) in the course of a disease.In this article, we consider the case where the number of covariates, p, exceeds the number of observations, N, a setting typical of microarray gene expression data. For a given vector of responses representing survival times of N subjects and the corresponding p × N gene expression matrix, we examine the problem of predicting the survival probability when N ≪ p. This is an ill-conditioned problem further compounded by the presence of possibly censored survival times. We propose a model that combines the partial least squares approach for dimensionality reduction with the accelerated failure time model, a widely used log-linear model for linking censored survival time to covariates. We develop parametric methods to account for censoring as well as for predicting patient survival probabilities. We illustrate the applicability of our methods using cancer microarray data and explore the biological relevance of our results using pathway analysis. Finally, we evaluate the performance of our methods using extensive simulation studies.

Abstract Image

Abstract Image

Abstract Image

一种用基因表达数据预测患者生存的监督方法。
近年来基因组学的快速发展使得使用DNA微阵列同时测量数千个基因的表达水平成为可能。这为我们对许多疾病的病理生理学的理解提供了巨大的增长潜力。当微阵列研究还包含诸如事件发生时间或死亡等结果变量的信息时,研究者的目标之一是了解基因(协变量)的表达水平与疾病过程中事件发生时间(称为生存时间)的关系。在本文中,我们考虑协变量的数量p超过观察值N的情况,这是微阵列基因表达数据的典型设置。对于表示N个受试者的生存时间和相应的p × N基因表达矩阵的反应向量,我们研究了当N≪p时预测生存概率的问题。这是一个病态问题,由于存在可能被剔除的生存时间而进一步复杂化。我们提出了一个模型,该模型结合了用于降维的偏最小二乘方法和加速失效时间模型,加速失效时间模型是一种广泛使用的对数线性模型,用于将截后生存时间与协变量联系起来。我们开发参数方法来考虑审查以及预测患者的生存概率。我们使用癌症微阵列数据说明了我们方法的适用性,并使用途径分析探索了我们结果的生物学相关性。最后,我们使用广泛的仿真研究来评估我们的方法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信