{"title":"Comparison between Complete and Ward’s Linkage Method in Hierarchical Clustering Analysis on Cancer Omics Dataset","authors":"Chen Xinyi","doi":"10.1109/icbcb55259.2022.9802487","DOIUrl":null,"url":null,"abstract":"Diseases, cancer as a particular example, can arise from a multitude of genetic and epigenetic changes. Studying gene expression profiles from tumor samples from cancer patients can reveal information about novel cancer subtypes. With the development of analytical approach, clustering methods are widely used on biomedical high-dimensional data, such as omics data, to find groups of samples that have similar profiles and identify subtypes of cancer. In our study, we applied hierarchical clustering on high dimensional mRNA-seq data to cluster the subtypes of cancer. Our focus is to compare the performance of different linkage methods—complete method and Ward’s method in hierarchical clustering and investigate the characteristics of datasets with which a more suitable linkage measure should be used. Our result shows that for dispersed dataset (Kurtosis>0.1, CV>5), Ward’s method performs better than complete method. On the other hand, complete method achieves more accurate clustering results than Ward’s method when it is used to analyze relatively more aggregated data (Kurtosis<0.1, CV<5).","PeriodicalId":429633,"journal":{"name":"2022 10th International Conference on Bioinformatics and Computational Biology (ICBCB)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Bioinformatics and Computational Biology (ICBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icbcb55259.2022.9802487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Diseases, cancer as a particular example, can arise from a multitude of genetic and epigenetic changes. Studying gene expression profiles from tumor samples from cancer patients can reveal information about novel cancer subtypes. With the development of analytical approach, clustering methods are widely used on biomedical high-dimensional data, such as omics data, to find groups of samples that have similar profiles and identify subtypes of cancer. In our study, we applied hierarchical clustering on high dimensional mRNA-seq data to cluster the subtypes of cancer. Our focus is to compare the performance of different linkage methods—complete method and Ward’s method in hierarchical clustering and investigate the characteristics of datasets with which a more suitable linkage measure should be used. Our result shows that for dispersed dataset (Kurtosis>0.1, CV>5), Ward’s method performs better than complete method. On the other hand, complete method achieves more accurate clustering results than Ward’s method when it is used to analyze relatively more aggregated data (Kurtosis<0.1, CV<5).