Ooi Cheng Jie, Nur Fariha Syaqina Zulkepli, R U Gobithaasan, Mohd Shareduwan Mohd Kasihmuddin, Nurul Syafiah Abd Naeeim, Mohd Salmi Md Noorani, Kamarul Imran Musa
{"title":"Comparative stability analysis of mixed clustering algorithms for Malaysian dengue epidemiology using topological descriptors.","authors":"Ooi Cheng Jie, Nur Fariha Syaqina Zulkepli, R U Gobithaasan, Mohd Shareduwan Mohd Kasihmuddin, Nurul Syafiah Abd Naeeim, Mohd Salmi Md Noorani, Kamarul Imran Musa","doi":"10.1016/j.actatropica.2025.107769","DOIUrl":null,"url":null,"abstract":"<p><p>Dengue fever remains a significant public health challenge in Malaysia, with high case numbers reported annually. Effective control and mitigation strategies require robust analytical tools to understand transmission dynamics and guide interventions. This study utilized topological data analysis (TDA) to extract structural features from epidemiological dengue time series using Euler characteristic curve. In contrast to traditional clustering methods that rely on direct application to dataset, TDA-based approach encodes qualitative topological information which is robust to noise and effectively capture underlying transmission dynamics. A comparative stability analysis is conducted by introducing controlled perturbations (noise) to the input data and the performance are accessed using multiple external validation metrics. Taking the k-medoids clustering algorithm with 30 % noise as an example, the TDA-based clustering approach (using the Euler characteristic) demonstrated significantly greater robustness and stability across all evaluation metrics. The Adjusted Rand Index (ARI) improved by 252 %, while the Normalized Mutual Information (NMI) increased by 122.5 %. The Fowlkes-Mallows Index (FMI) rose by 67.3 %. Additional improvements are seen in V-Measure (122.5 %), Homogeneity (80 %), and Completeness (169.7 %), highlighting the superior performance of the TDA-based approach under noisy conditions. These results demonstrate the noise resistance of the TDA-based clustering method, highlighting its enhanced ability to preserve meaningful cluster structures under perturbation compared to traditional approaches. This improved robustness facilitates deeper insights into disease transmission patterns and supports more effective data-driven public health analysis.</p>","PeriodicalId":7240,"journal":{"name":"Acta tropica","volume":" ","pages":"107769"},"PeriodicalIF":2.5000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta tropica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.actatropica.2025.107769","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/6 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PARASITOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Dengue fever remains a significant public health challenge in Malaysia, with high case numbers reported annually. Effective control and mitigation strategies require robust analytical tools to understand transmission dynamics and guide interventions. This study utilized topological data analysis (TDA) to extract structural features from epidemiological dengue time series using Euler characteristic curve. In contrast to traditional clustering methods that rely on direct application to dataset, TDA-based approach encodes qualitative topological information which is robust to noise and effectively capture underlying transmission dynamics. A comparative stability analysis is conducted by introducing controlled perturbations (noise) to the input data and the performance are accessed using multiple external validation metrics. Taking the k-medoids clustering algorithm with 30 % noise as an example, the TDA-based clustering approach (using the Euler characteristic) demonstrated significantly greater robustness and stability across all evaluation metrics. The Adjusted Rand Index (ARI) improved by 252 %, while the Normalized Mutual Information (NMI) increased by 122.5 %. The Fowlkes-Mallows Index (FMI) rose by 67.3 %. Additional improvements are seen in V-Measure (122.5 %), Homogeneity (80 %), and Completeness (169.7 %), highlighting the superior performance of the TDA-based approach under noisy conditions. These results demonstrate the noise resistance of the TDA-based clustering method, highlighting its enhanced ability to preserve meaningful cluster structures under perturbation compared to traditional approaches. This improved robustness facilitates deeper insights into disease transmission patterns and supports more effective data-driven public health analysis.
登革热在马来西亚仍然是一个重大的公共卫生挑战,每年报告的病例数很高。有效的控制和缓解战略需要强有力的分析工具来了解传播动态并指导干预措施。本研究利用拓扑数据分析(TDA),利用欧拉特征曲线提取登革热流行病学时间序列的结构特征。与传统的直接应用于数据集的聚类方法相比,基于tda的聚类方法对定性拓扑信息进行编码,对噪声具有鲁棒性,并能有效捕获潜在的传输动态。通过对输入数据引入可控扰动(噪声)进行比较稳定性分析,并使用多个外部验证指标访问性能。以含30%噪声的k-medoids聚类算法为例,基于tda的聚类方法(利用欧拉特征)在所有评价指标上都表现出更强的鲁棒性和稳定性。调整后的兰特指数(ARI)提高了252%,而标准化的相互信息(NMI)提高了122.5%。Fowlkes-Mallows Index (FMI)上涨67.3%。V-Measure(122.5%)、同质性(80%)和完备性(169.7%)方面也有进一步的改进,突出了基于tda的方法在噪声条件下的优越性能。这些结果证明了基于tda的聚类方法的抗噪声性,突出了与传统方法相比,它在扰动下保持有意义的聚类结构的能力增强。这种增强的健壮性有助于更深入地了解疾病传播模式,并支持更有效的数据驱动的公共卫生分析。
期刊介绍:
Acta Tropica, is an international journal on infectious diseases that covers public health sciences and biomedical research with particular emphasis on topics relevant to human and animal health in the tropics and the subtropics.