Finding associations among SNPS for prostate cancer using collaborative filtering

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI:10.1145/2390068.2390080

Rohit Kugaonkar, A. Gangopadhyay, Y. Yesha, A. Joshi, Y. Yesha, M. Grasso, Mary Brady, N. Rishe

{"title":"Finding associations among SNPS for prostate cancer using collaborative filtering","authors":"Rohit Kugaonkar, A. Gangopadhyay, Y. Yesha, A. Joshi, Y. Yesha, M. Grasso, Mary Brady, N. Rishe","doi":"10.1145/2390068.2390080","DOIUrl":null,"url":null,"abstract":"Prostate cancer is the second leading cause of cancer related deaths among men. Because of the slow growing nature of prostate cancer, sometimes surgical treatment is not required for less aggressive cancers. Recent debates over prostate-specific antigen (PSA) screening have drawn new attention to prostate cancer. Genome-based screening can potentially help in assessing the risk of developing prostate cancer. Due to the complicated nature of prostate cancer, studying the entire genome is essential to find genomic traits. Due to the high cost of studying all Single Nucleotide Polymorphisms (SNPs), it is essential to find tag SNPs which can represent other SNPs. Earlier methods to find tag SNPs using associations between SNPs either use SNP's location information or are based on data of very few SNP markers in each sample. Our study is based on 2300 samples with 550,000 SNPs each. We have not used SNP location information or any predefined standard cut-offs to find tag SNPs. Our approach is based on using collaborative filtering methods to find pairwise associations among SNPs and thus list top-N tag SNPs. We have found 25 tag SNPs which have highest similarities to other SNPs. In addition we found 16 more SNPs which have high correlation with the known high risk SNPs that are associated with prostate cancer. We used some of these newly found SNPs with 5 different classification algorithms and observed some improvement in prostate cancer prediction accuracy over using the original known high risk SNPs.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2390068.2390080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Prostate cancer is the second leading cause of cancer related deaths among men. Because of the slow growing nature of prostate cancer, sometimes surgical treatment is not required for less aggressive cancers. Recent debates over prostate-specific antigen (PSA) screening have drawn new attention to prostate cancer. Genome-based screening can potentially help in assessing the risk of developing prostate cancer. Due to the complicated nature of prostate cancer, studying the entire genome is essential to find genomic traits. Due to the high cost of studying all Single Nucleotide Polymorphisms (SNPs), it is essential to find tag SNPs which can represent other SNPs. Earlier methods to find tag SNPs using associations between SNPs either use SNP's location information or are based on data of very few SNP markers in each sample. Our study is based on 2300 samples with 550,000 SNPs each. We have not used SNP location information or any predefined standard cut-offs to find tag SNPs. Our approach is based on using collaborative filtering methods to find pairwise associations among SNPs and thus list top-N tag SNPs. We have found 25 tag SNPs which have highest similarities to other SNPs. In addition we found 16 more SNPs which have high correlation with the known high risk SNPs that are associated with prostate cancer. We used some of these newly found SNPs with 5 different classification algorithms and observed some improvement in prostate cancer prediction accuracy over using the original known high risk SNPs.

查看原文本刊更多论文

利用协同过滤发现前列腺癌snp之间的关联

前列腺癌是男性癌症相关死亡的第二大原因。由于前列腺癌生长缓慢，对于侵袭性较低的癌症，有时不需要手术治疗。最近关于前列腺特异性抗原(PSA)筛查的争论引起了人们对前列腺癌的新的关注。基于基因组的筛查可能有助于评估患前列腺癌的风险。由于前列腺癌的复杂性，研究整个基因组对于发现基因组特征至关重要。由于研究所有单核苷酸多态性(Single Nucleotide Polymorphisms, SNPs)的成本很高，因此寻找能够代表其他snp的标签snp是至关重要的。早期使用SNP之间的关联来查找标签SNP的方法要么使用SNP的位置信息，要么基于每个样本中很少的SNP标记的数据。我们的研究基于2300个样本，每个样本有55万个snp。我们没有使用SNP位置信息或任何预定义的标准截断来查找标签SNP。我们的方法是基于使用协同过滤方法来查找snp之间的成对关联，从而列出top-N标签snp。我们发现了25个与其他snp相似性最高的标签snp。此外，我们还发现了16个snp与已知的与前列腺癌相关的高风险snp高度相关。我们将其中一些新发现的snp与5种不同的分类算法一起使用，并观察到与使用原始已知的高风险snp相比，前列腺癌预测准确性有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data and Text Mining in Bioinformatics

自引率

0.00%

发文量