基于拓扑数据分析的精准医疗子群发现新方法。

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-03-19 DOI:10.1186/s12911-025-02852-9

Ciara F Loughrey, Sarah Maguire, Paweł Dłotko, Lu Bai, Nick Orr, Anna Jurek-Loughrey

{"title":"基于拓扑数据分析的精准医疗子群发现新方法。","authors":"Ciara F Loughrey, Sarah Maguire, Paweł Dłotko, Lu Bai, Nick Orr, Anna Jurek-Loughrey","doi":"10.1186/s12911-025-02852-9","DOIUrl":null,"url":null,"abstract":"Background: The Mapper algorithm is a data mining topological tool that can help us to obtain higher level understanding of disease by visualising the structure of patient data as a similarity graph. It has been successfully applied for exploratory analysis of cancer data in the past, delivering several significant subgroup discoveries. Using the Mapper algorithm in practice requires setting up multiple parameters. The graph then needs to be manually analysed according to a research question at hand. It has been highlighted in the literature that Mapper's parameters have significant impact on the output graph shape and there is no established way to select their optimal values. Hence while using the Mapper algorithm, different parameter values and consequently different output graphs need to be studied. This prevents routine application of the Mapper algorithm in real world settings.Methods: We propose a new algorithm for subgroup discovery within the Mapper graph. We refer to the task as hotspot detection as it is designed to identify homogenous and geometrically compact subsets of patients, which are distinct with respect to their clinical or molecular profiles (e.g. survival). Furthermore, we propose to include the existence of a hotspot as a criterion while searching the parameter space, addressing one of the key limitations of the Mapper algorithm (i.e. parameter selection).Results: Two experiments were performed to demonstrate the efficacy of the algorithm, including an artificial hotspot in the Two Circles dataset and a real world case study of subgroup discovery in oestrogen receptor-positive breast cancer. Our hotspot detection algorithm successfully identified graphs containing homogenous communities of nodes within the Two Circles dataset. When applied to gene expression data of ER+ breast cancer patients, appropriate parameters were identified to generate a Mapper graph revealing a hotspot of ER+ patients with poor prognosis and characteristic patterns of gene expression. This was subsequently confirmed in an independent breast cancer dataset.Conclusions: Our proposed method can be effectively applied for subgroup discovery with pathology data. It allows us to find optimal parameters of the Mapper algorithm, bridging the gap between its potential and the translational research.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"139"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921513/pdf/","citationCount":"0","resultStr":"{\"title\":\"A novel method for subgroup discovery in precision medicine based on topological data analysis.\",\"authors\":\"Ciara F Loughrey, Sarah Maguire, Paweł Dłotko, Lu Bai, Nick Orr, Anna Jurek-Loughrey\",\"doi\":\"10.1186/s12911-025-02852-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: The Mapper algorithm is a data mining topological tool that can help us to obtain higher level understanding of disease by visualising the structure of patient data as a similarity graph. It has been successfully applied for exploratory analysis of cancer data in the past, delivering several significant subgroup discoveries. Using the Mapper algorithm in practice requires setting up multiple parameters. The graph then needs to be manually analysed according to a research question at hand. It has been highlighted in the literature that Mapper's parameters have significant impact on the output graph shape and there is no established way to select their optimal values. Hence while using the Mapper algorithm, different parameter values and consequently different output graphs need to be studied. This prevents routine application of the Mapper algorithm in real world settings.Methods: We propose a new algorithm for subgroup discovery within the Mapper graph. We refer to the task as hotspot detection as it is designed to identify homogenous and geometrically compact subsets of patients, which are distinct with respect to their clinical or molecular profiles (e.g. survival). Furthermore, we propose to include the existence of a hotspot as a criterion while searching the parameter space, addressing one of the key limitations of the Mapper algorithm (i.e. parameter selection).Results: Two experiments were performed to demonstrate the efficacy of the algorithm, including an artificial hotspot in the Two Circles dataset and a real world case study of subgroup discovery in oestrogen receptor-positive breast cancer. Our hotspot detection algorithm successfully identified graphs containing homogenous communities of nodes within the Two Circles dataset. When applied to gene expression data of ER+ breast cancer patients, appropriate parameters were identified to generate a Mapper graph revealing a hotspot of ER+ patients with poor prognosis and characteristic patterns of gene expression. This was subsequently confirmed in an independent breast cancer dataset.Conclusions: Our proposed method can be effectively applied for subgroup discovery with pathology data. It allows us to find optimal parameters of the Mapper algorithm, bridging the gap between its potential and the translational research.\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"139\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921513/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-02852-9\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02852-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：Mapper算法是一种数据挖掘拓扑工具，通过将患者数据的结构可视化为相似图，可以帮助我们获得对疾病的更高层次的理解。过去，它已经成功地应用于癌症数据的探索性分析，提供了几个重要的亚群发现。在实际中使用Mapper算法需要设置多个参数。然后需要根据手头的研究问题对图表进行手动分析。文献中已经强调，Mapper参数对输出图形形状有显著影响，并且没有确定的方法来选择它们的最优值。因此，在使用Mapper算法时，需要研究不同的参数值，从而研究不同的输出图。这阻止了Mapper算法在现实世界中的常规应用。方法：提出一种新的Mapper图子群发现算法。我们将该任务称为热点检测，因为它旨在识别同质和几何紧凑的患者亚群，这些亚群在临床或分子特征（例如生存）方面是不同的。此外，我们建议在搜索参数空间时将热点的存在性作为标准，以解决Mapper算法的一个关键限制（即参数选择）。结果：通过两个实验来验证该算法的有效性，包括在Two Circles数据集中的人工热点和在雌激素受体阳性乳腺癌中发现亚群的真实案例研究。我们的热点检测算法成功地识别出了在Two Circles数据集中包含同质社区节点的图。将ER+乳腺癌患者的基因表达数据应用到ER+乳腺癌患者的基因表达数据中，识别合适的参数，生成Mapper图，揭示ER+预后不良患者的一个热点和基因表达的特征模式。这随后在一个独立的乳腺癌数据集中得到了证实。结论：该方法可以有效地应用于病理数据的亚群发现。它使我们能够找到Mapper算法的最佳参数，弥合其潜力与转化研究之间的差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A novel method for subgroup discovery in precision medicine based on topological data analysis.

Background: The Mapper algorithm is a data mining topological tool that can help us to obtain higher level understanding of disease by visualising the structure of patient data as a similarity graph. It has been successfully applied for exploratory analysis of cancer data in the past, delivering several significant subgroup discoveries. Using the Mapper algorithm in practice requires setting up multiple parameters. The graph then needs to be manually analysed according to a research question at hand. It has been highlighted in the literature that Mapper's parameters have significant impact on the output graph shape and there is no established way to select their optimal values. Hence while using the Mapper algorithm, different parameter values and consequently different output graphs need to be studied. This prevents routine application of the Mapper algorithm in real world settings.

Methods: We propose a new algorithm for subgroup discovery within the Mapper graph. We refer to the task as hotspot detection as it is designed to identify homogenous and geometrically compact subsets of patients, which are distinct with respect to their clinical or molecular profiles (e.g. survival). Furthermore, we propose to include the existence of a hotspot as a criterion while searching the parameter space, addressing one of the key limitations of the Mapper algorithm (i.e. parameter selection).

Results: Two experiments were performed to demonstrate the efficacy of the algorithm, including an artificial hotspot in the Two Circles dataset and a real world case study of subgroup discovery in oestrogen receptor-positive breast cancer. Our hotspot detection algorithm successfully identified graphs containing homogenous communities of nodes within the Two Circles dataset. When applied to gene expression data of ER+ breast cancer patients, appropriate parameters were identified to generate a Mapper graph revealing a hotspot of ER+ patients with poor prognosis and characteristic patterns of gene expression. This was subsequently confirmed in an independent breast cancer dataset.

Conclusions: Our proposed method can be effectively applied for subgroup discovery with pathology data. It allows us to find optimal parameters of the Mapper algorithm, bridging the gap between its potential and the translational research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.