{"title":"Identifying Cancer Subtypes based on Somatic Mutation Profile","authors":"Sungchul Kim, Lee Sael, Hwanjo Yu","doi":"10.1145/2665970.2665980","DOIUrl":null,"url":null,"abstract":"Tumor stratification is one of the basic tasks in cancer genomics for a better understanding of the tumor heterogeneity and better targeted treatments. There are various biological data that can be used to stratify tumors including gene expression and sequencing data. In this work, we use the somatic mutation data. Two types of somatic mutation profiles are generated and clustered using k-means clustering with appropriate distance measures to obtain cancer subtypes for each cancer type: binary somatic mutation profile and weighted somatic mutation profile. According to the predictive power of clinical features and survival time of the identified subtypes, the binary somatic mutation profile with Jaccard distance (B-Jac) performed the best and the weighted somatic mutation profile with Euclidean distance (W-Euc) performed comparably. Both approaches performed significantly better than the typical usage of somatic mutation, i.e. the binary somatic mutation profile with Euclidean distance (B-Euc).","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2665970.2665980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Tumor stratification is one of the basic tasks in cancer genomics for a better understanding of the tumor heterogeneity and better targeted treatments. There are various biological data that can be used to stratify tumors including gene expression and sequencing data. In this work, we use the somatic mutation data. Two types of somatic mutation profiles are generated and clustered using k-means clustering with appropriate distance measures to obtain cancer subtypes for each cancer type: binary somatic mutation profile and weighted somatic mutation profile. According to the predictive power of clinical features and survival time of the identified subtypes, the binary somatic mutation profile with Jaccard distance (B-Jac) performed the best and the weighted somatic mutation profile with Euclidean distance (W-Euc) performed comparably. Both approaches performed significantly better than the typical usage of somatic mutation, i.e. the binary somatic mutation profile with Euclidean distance (B-Euc).