Mining cancer genes with running-sum statistics

Data and Text Mining in Bioinformatics Pub Date : 2009-11-06 DOI:10.1145/1651318.1651326

Inho Park, Kwang-H. Lee, Doheon Lee

{"title":"Mining cancer genes with running-sum statistics","authors":"Inho Park, Kwang-H. Lee, Doheon Lee","doi":"10.1145/1651318.1651326","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new method to detect candidate cancer genes for developing molecular biomarkers or therapeutic targets from cancer microarray datasets. To resolve problems resulted in the molecular heterogeneity of cancers on gene prioritizing, our proposed method is intended to identify genes that are over- or down- expressed not in the whole cancer samples but also in a subgroup of cancer samples. To this end, we propose the RS score for gene ranking calculated with a weighted running sum statistic on the ordered list of expression values of each gene. We apply the proposed method to publically available prostate cancer microarray datasets, showing that it can identify previously well known prostate cancer associated genes such as ERG, HPN, and AMACR at the top of the list of candidate genes. Embedding samples, represented as vectors of the expression values of the top 20 genes, into a two dimensional space using the commute time embedding shows the distinction between normal samples and cancer samples in the independent test datasets as well as in the training datasets. We further evaluate the proposed method by estimating classification performance on the independent test datasets, and it shows the better classification performance compared to the other cancer outlier profile approaches.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1651318.1651326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In this paper, we propose a new method to detect candidate cancer genes for developing molecular biomarkers or therapeutic targets from cancer microarray datasets. To resolve problems resulted in the molecular heterogeneity of cancers on gene prioritizing, our proposed method is intended to identify genes that are over- or down- expressed not in the whole cancer samples but also in a subgroup of cancer samples. To this end, we propose the RS score for gene ranking calculated with a weighted running sum statistic on the ordered list of expression values of each gene. We apply the proposed method to publically available prostate cancer microarray datasets, showing that it can identify previously well known prostate cancer associated genes such as ERG, HPN, and AMACR at the top of the list of candidate genes. Embedding samples, represented as vectors of the expression values of the top 20 genes, into a two dimensional space using the commute time embedding shows the distinction between normal samples and cancer samples in the independent test datasets as well as in the training datasets. We further evaluate the proposed method by estimating classification performance on the independent test datasets, and it shows the better classification performance compared to the other cancer outlier profile approaches.

查看原文本刊更多论文

用运行和统计挖掘癌症基因

在本文中，我们提出了一种新的方法来检测候选癌症基因，用于从癌症微阵列数据集开发分子生物标志物或治疗靶点。为了解决癌症在基因优先级上的分子异质性问题，我们提出的方法旨在确定不是在整个癌症样本中，而是在癌症样本的一个亚组中过度表达或低表达的基因。为此，我们提出了基因排序的RS评分，通过对每个基因表达值的有序列表进行加权运行和统计计算。我们将提出的方法应用于公开可用的前列腺癌微阵列数据集，结果表明它可以识别出候选基因列表顶部的先前已知的前列腺癌相关基因，如ERG, HPN和AMACR。使用通勤时间嵌入将样本(表示为前20个基因的表达值向量)嵌入到二维空间中，显示了独立测试数据集和训练数据集中正常样本和癌症样本的区别。我们通过在独立测试数据集上估计分类性能来进一步评估所提出的方法，与其他癌症离群值剖面方法相比，它显示出更好的分类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data and Text Mining in Bioinformatics

自引率

0.00%

发文量