Elaheh Zarean, Shuai Li, Ee Ming Wong, Enes Makalic, Roger L Milne, Graham G Giles, Catriona McLean, Melissa C Southey, Pierre-Antoine Dugué
{"title":"Evaluation of agreement between common clustering strategies for DNA methylation-based subtyping of breast tumours.","authors":"Elaheh Zarean, Shuai Li, Ee Ming Wong, Enes Makalic, Roger L Milne, Graham G Giles, Catriona McLean, Melissa C Southey, Pierre-Antoine Dugué","doi":"10.1080/17501911.2024.2441653","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Clustering algorithms have been widely applied to tumor DNA methylation datasets to define methylation-based cancer subtypes. This study aimed to evaluate the agreement between subtypes obtained from common clustering strategies.</p><p><strong>Materials & methods: </strong>We used tumor DNA methylation data from 409 women with breast cancer from the Melbourne Collaborative Cohort Study (MCCS) and 781 breast tumors from The Cancer Genome Atlas (TCGA). Agreement was assessed using the adjusted Rand index for various combinations of number of CpGs, number of clusters and clustering algorithms (hierarchical, K-means, partitioning around medoids, and recursively partitioned mixture models).</p><p><strong>Results: </strong>Inconsistent agreement patterns were observed for between-algorithm and within-algorithm comparisons, with generally poor to moderate agreement (ARI <0.7). Results were qualitatively similar in the MCCS and TCGA, showing better agreement for moderate number of CpGs and fewer clusters (K = 2). Restricting the analysis to CpGs that were differentially-methylated between tumor and normal tissue did not result in higher agreement.</p><p><strong>Conclusion: </strong>Our study highlights that common clustering strategies involving an arbitrary choice of algorithm, number of clusters and number of methylation sites are likely to identify different DNA methylation-based breast tumor subtypes.</p>","PeriodicalId":11959,"journal":{"name":"Epigenomics","volume":" ","pages":"105-114"},"PeriodicalIF":3.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epigenomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/17501911.2024.2441653","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/23 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Aims: Clustering algorithms have been widely applied to tumor DNA methylation datasets to define methylation-based cancer subtypes. This study aimed to evaluate the agreement between subtypes obtained from common clustering strategies.
Materials & methods: We used tumor DNA methylation data from 409 women with breast cancer from the Melbourne Collaborative Cohort Study (MCCS) and 781 breast tumors from The Cancer Genome Atlas (TCGA). Agreement was assessed using the adjusted Rand index for various combinations of number of CpGs, number of clusters and clustering algorithms (hierarchical, K-means, partitioning around medoids, and recursively partitioned mixture models).
Results: Inconsistent agreement patterns were observed for between-algorithm and within-algorithm comparisons, with generally poor to moderate agreement (ARI <0.7). Results were qualitatively similar in the MCCS and TCGA, showing better agreement for moderate number of CpGs and fewer clusters (K = 2). Restricting the analysis to CpGs that were differentially-methylated between tumor and normal tissue did not result in higher agreement.
Conclusion: Our study highlights that common clustering strategies involving an arbitrary choice of algorithm, number of clusters and number of methylation sites are likely to identify different DNA methylation-based breast tumor subtypes.
期刊介绍:
Epigenomics provides the forum to address the rapidly progressing research developments in this ever-expanding field; to report on the major challenges ahead and critical advances that are propelling the science forward. The journal delivers this information in concise, at-a-glance article formats – invaluable to a time constrained community.
Substantial developments in our current knowledge and understanding of genomics and epigenetics are constantly being made, yet this field is still in its infancy. Epigenomics provides a critical overview of the latest and most significant advances as they unfold and explores their potential application in the clinical setting.