Selection of patient samples and genes for outcome prediction.

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI:10.1109/csb.2004.1332451

Huiqing Liu, Jinyan Li, Limsoon Wong

{"title":"Selection of patient samples and genes for outcome prediction.","authors":"Huiqing Liu, Jinyan Li, Limsoon Wong","doi":"10.1109/csb.2004.1332451","DOIUrl":null,"url":null,"abstract":"<p><p>Gene expression profiles with clinical outcome data enable monitoring of disease progression and prediction of patient survival at the molecular level. We present a new computational method for outcome prediction. Our idea is to use an informative subset of original training samples. This subset consists of only short-term survivors who died within a short period and long-term survivors who were still alive after a long follow-up time. These extreme training samples yield a clear platform to identify genes whose expression is related to survival. To find relevant genes, we combine two feature selection methods -- entropy measure and Wilcoxon rank sum test -- so that a set of sharp discriminating features are identified. The selected training samples and genes are then integrated by a support vector machine to build a prediction model, by which each validation sample is assigned a survival/relapse risk score for drawing Kaplan-Meier survival curves. We apply this method to two data sets: diffuse large-B-cell lymphoma (DLBCL) and primary lung adenocarcinoma. In both cases, patients in high and low risk groups stratified by our risk scores are clearly distinguishable. We also compare our risk scores to some clinical factors, such as International Prognostic Index score for DLBCL analysis and tumor stage information for lung adenocarcinoma. Our results indicate that gene expression profiles combined with carefully chosen learning algorithms can predict patient survival for certain diseases.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"382-92"},"PeriodicalIF":0.0000,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332451","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csb.2004.1332451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Gene expression profiles with clinical outcome data enable monitoring of disease progression and prediction of patient survival at the molecular level. We present a new computational method for outcome prediction. Our idea is to use an informative subset of original training samples. This subset consists of only short-term survivors who died within a short period and long-term survivors who were still alive after a long follow-up time. These extreme training samples yield a clear platform to identify genes whose expression is related to survival. To find relevant genes, we combine two feature selection methods -- entropy measure and Wilcoxon rank sum test -- so that a set of sharp discriminating features are identified. The selected training samples and genes are then integrated by a support vector machine to build a prediction model, by which each validation sample is assigned a survival/relapse risk score for drawing Kaplan-Meier survival curves. We apply this method to two data sets: diffuse large-B-cell lymphoma (DLBCL) and primary lung adenocarcinoma. In both cases, patients in high and low risk groups stratified by our risk scores are clearly distinguishable. We also compare our risk scores to some clinical factors, such as International Prognostic Index score for DLBCL analysis and tumor stage information for lung adenocarcinoma. Our results indicate that gene expression profiles combined with carefully chosen learning algorithms can predict patient survival for certain diseases.

查看原文本刊更多论文

选择患者样本和基因进行预后预测。

具有临床结果数据的基因表达谱能够在分子水平上监测疾病进展和预测患者生存。我们提出了一种新的预测结果的计算方法。我们的想法是使用原始训练样本的信息子集。这个子集只包括短期内死亡的短期幸存者和长期随访后仍然存活的长期幸存者。这些极端的训练样本提供了一个清晰的平台来识别与生存相关的基因表达。为了找到相关的基因，我们结合了两种特征选择方法——熵测度和Wilcoxon秩和检验——从而识别出一组具有明显区别的特征。然后，通过支持向量机将选定的训练样本和基因进行整合，构建预测模型，通过该模型为每个验证样本分配生存/复发风险评分，绘制Kaplan-Meier生存曲线。我们将这种方法应用于两个数据集:弥漫性大b细胞淋巴瘤(DLBCL)和原发性肺腺癌。在这两种情况下，通过我们的风险评分分层的高风险和低风险组患者是明显可区分的。我们还将我们的风险评分与一些临床因素进行比较，例如用于DLBCL分析的国际预后指数评分和用于肺腺癌的肿瘤分期信息。我们的研究结果表明，基因表达谱与精心选择的学习算法相结合，可以预测某些疾病的患者生存率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE Computational Systems Bioinformatics Conference

自引率

0.00%

发文量