Andrea Campagner , Luca Marconi , Edoardo Bianchi , Beatrice Arosio , Paolo Rossi , Giorgio Annoni , Tiziano Angelo Lucchi , Nicola Montano , Federico Cabitza
{"title":"揭示痴呆中隐藏的亚型:一种用于痴呆诊断和个性化护理的无监督机器学习方法。","authors":"Andrea Campagner , Luca Marconi , Edoardo Bianchi , Beatrice Arosio , Paolo Rossi , Giorgio Annoni , Tiziano Angelo Lucchi , Nicola Montano , Federico Cabitza","doi":"10.1016/j.jbi.2025.104799","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective:</h3><div>Dementia represents a growing public health challenge, affecting an increasing number of individuals. It encompasses a broad spectrum of cognitive impairments, ranging from mild to severe stages, each of which demands varying levels of care. Current diagnostic approaches often treat dementia as a uniform condition, potentially overlooking clinically significant subtypes, which limits the effectiveness of treatment and care strategies. This study seeks to address the limitations of traditional diagnostic methods by applying unsupervised machine learning techniques to a large, multi-modal dataset of dementia patients (encompassing multiple data sources including clinical, demographic, gene expression and protein concentrations), with the aim of identifying distinct subtypes within the population. The primary focus is on differentiating between mild and severe stages of dementia to improve diagnostic accuracy and personalize treatment plans.</div></div><div><h3>Methods:</h3><div>The dataset analyzed included 911 individuals, described by 157 multi-modal characteristics, encompassing clinical, genomic, proteomic and sociodemographic features. After handling missing data, the dataset was reduced to 561 rows and 135 columns. Various dimensionality reduction techniques were applied to improve the feature-to-patient ratio, and unsupervised clustering methods were employed to identify potential subtypes. The major novelty in our methodology regards the combination of different techniques, bridging high-dimensional statistical inference, multi-modal dimensionality reduction and clustering analysis, to appropriately model the multi-modal nature of the data and ensure clinical relevance.</div></div><div><h3>Results:</h3><div>The analysis revealed distinct clusters within the dementia population, each characterized by specific clinical and demographic profiles. These profiles included variations in biomarkers, cognitive scores, and disability levels. The findings suggest the presence of previously unrecognized subgroups, distinguished by their genomic, proteomic, and clinical characteristics.</div></div><div><h3>Conclusion:</h3><div>This study demonstrates that unsupervised machine learning can effectively identify clinically relevant subtypes of dementia, with important implications for diagnosis and personalized treatment. Further research is required to validate these findings and investigate their potential to improve patient outcomes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"165 ","pages":"Article 104799"},"PeriodicalIF":4.0000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncovering hidden subtypes in dementia: An unsupervised machine learning approach to dementia diagnosis and personalization of care\",\"authors\":\"Andrea Campagner , Luca Marconi , Edoardo Bianchi , Beatrice Arosio , Paolo Rossi , Giorgio Annoni , Tiziano Angelo Lucchi , Nicola Montano , Federico Cabitza\",\"doi\":\"10.1016/j.jbi.2025.104799\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective:</h3><div>Dementia represents a growing public health challenge, affecting an increasing number of individuals. It encompasses a broad spectrum of cognitive impairments, ranging from mild to severe stages, each of which demands varying levels of care. Current diagnostic approaches often treat dementia as a uniform condition, potentially overlooking clinically significant subtypes, which limits the effectiveness of treatment and care strategies. This study seeks to address the limitations of traditional diagnostic methods by applying unsupervised machine learning techniques to a large, multi-modal dataset of dementia patients (encompassing multiple data sources including clinical, demographic, gene expression and protein concentrations), with the aim of identifying distinct subtypes within the population. The primary focus is on differentiating between mild and severe stages of dementia to improve diagnostic accuracy and personalize treatment plans.</div></div><div><h3>Methods:</h3><div>The dataset analyzed included 911 individuals, described by 157 multi-modal characteristics, encompassing clinical, genomic, proteomic and sociodemographic features. After handling missing data, the dataset was reduced to 561 rows and 135 columns. Various dimensionality reduction techniques were applied to improve the feature-to-patient ratio, and unsupervised clustering methods were employed to identify potential subtypes. The major novelty in our methodology regards the combination of different techniques, bridging high-dimensional statistical inference, multi-modal dimensionality reduction and clustering analysis, to appropriately model the multi-modal nature of the data and ensure clinical relevance.</div></div><div><h3>Results:</h3><div>The analysis revealed distinct clusters within the dementia population, each characterized by specific clinical and demographic profiles. These profiles included variations in biomarkers, cognitive scores, and disability levels. The findings suggest the presence of previously unrecognized subgroups, distinguished by their genomic, proteomic, and clinical characteristics.</div></div><div><h3>Conclusion:</h3><div>This study demonstrates that unsupervised machine learning can effectively identify clinically relevant subtypes of dementia, with important implications for diagnosis and personalized treatment. Further research is required to validate these findings and investigate their potential to improve patient outcomes.</div></div>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\"165 \",\"pages\":\"Article 104799\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1532046425000280\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000280","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Uncovering hidden subtypes in dementia: An unsupervised machine learning approach to dementia diagnosis and personalization of care
Objective:
Dementia represents a growing public health challenge, affecting an increasing number of individuals. It encompasses a broad spectrum of cognitive impairments, ranging from mild to severe stages, each of which demands varying levels of care. Current diagnostic approaches often treat dementia as a uniform condition, potentially overlooking clinically significant subtypes, which limits the effectiveness of treatment and care strategies. This study seeks to address the limitations of traditional diagnostic methods by applying unsupervised machine learning techniques to a large, multi-modal dataset of dementia patients (encompassing multiple data sources including clinical, demographic, gene expression and protein concentrations), with the aim of identifying distinct subtypes within the population. The primary focus is on differentiating between mild and severe stages of dementia to improve diagnostic accuracy and personalize treatment plans.
Methods:
The dataset analyzed included 911 individuals, described by 157 multi-modal characteristics, encompassing clinical, genomic, proteomic and sociodemographic features. After handling missing data, the dataset was reduced to 561 rows and 135 columns. Various dimensionality reduction techniques were applied to improve the feature-to-patient ratio, and unsupervised clustering methods were employed to identify potential subtypes. The major novelty in our methodology regards the combination of different techniques, bridging high-dimensional statistical inference, multi-modal dimensionality reduction and clustering analysis, to appropriately model the multi-modal nature of the data and ensure clinical relevance.
Results:
The analysis revealed distinct clusters within the dementia population, each characterized by specific clinical and demographic profiles. These profiles included variations in biomarkers, cognitive scores, and disability levels. The findings suggest the presence of previously unrecognized subgroups, distinguished by their genomic, proteomic, and clinical characteristics.
Conclusion:
This study demonstrates that unsupervised machine learning can effectively identify clinically relevant subtypes of dementia, with important implications for diagnosis and personalized treatment. Further research is required to validate these findings and investigate their potential to improve patient outcomes.
期刊介绍:
The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.