A divisive hierarchical clustering methodology for enhancing the ensemble prediction power in large scale population studies: the ATHLOS project.

IF 4.7 3区 医学 Q1 MEDICAL INFORMATICS
Health Information Science and Systems Pub Date : 2022-04-18 eCollection Date: 2022-12-01 DOI:10.1007/s13755-022-00171-1
Petros Barmpas, Sotiris Tasoulis, Aristidis G Vrahatis, Spiros V Georgakopoulos, Panagiotis Anagnostou, Matthew Prina, José Luis Ayuso-Mateos, Jerome Bickenbach, Ivet Bayes, Martin Bobak, Francisco Félix Caballero, Somnath Chatterji, Laia Egea-Cortés, Esther García-Esquinas, Matilde Leonardi, Seppo Koskinen, Ilona Koupil, Andrzej Paja K, Martin Prince, Warren Sanderson, Sergei Scherbov, Abdonas Tamosiunas, Aleksander Galas, Josep Maria Haro, Albert Sanchez-Niubo, Vassilis P Plagianakos, Demosthenes Panagiotakos
{"title":"A divisive hierarchical clustering methodology for enhancing the ensemble prediction power in large scale population studies: the ATHLOS project.","authors":"Petros Barmpas, Sotiris Tasoulis, Aristidis G Vrahatis, Spiros V Georgakopoulos, Panagiotis Anagnostou, Matthew Prina, José Luis Ayuso-Mateos, Jerome Bickenbach, Ivet Bayes, Martin Bobak, Francisco Félix Caballero, Somnath Chatterji, Laia Egea-Cortés, Esther García-Esquinas, Matilde Leonardi, Seppo Koskinen, Ilona Koupil, Andrzej Paja K, Martin Prince, Warren Sanderson, Sergei Scherbov, Abdonas Tamosiunas, Aleksander Galas, Josep Maria Haro, Albert Sanchez-Niubo, Vassilis P Plagianakos, Demosthenes Panagiotakos","doi":"10.1007/s13755-022-00171-1","DOIUrl":null,"url":null,"abstract":"<p><p>The ATHLOS cohort is composed of several harmonized datasets of international groups related to health and aging. As a result, the Healthy Aging index has been constructed based on a selection of variables from 16 individual studies. In this paper, we consider additional variables found in ATHLOS and investigate their utilization for predicting the Healthy Aging index. For this purpose, motivated by the volume and diversity of the dataset, we focus our attention upon data clustering, where unsupervised learning is utilized to enhance prediction power. Thus we show the predictive utility of exploiting hidden data structures. In addition, we demonstrate that imposed computation bottlenecks can be surpassed when using appropriate hierarchical clustering, within a clustering for ensemble classification scheme, while retaining prediction benefits. We propose a complete methodology that is evaluated against baseline methods and the original concept. The results are very encouraging suggesting further developments in this direction along with applications in tasks with similar characteristics. A straightforward open source implementation for the R project is also provided (https://github.com/Petros-Barmpas/HCEP).</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s13755-022-00171-1.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"10 1","pages":"6"},"PeriodicalIF":4.7000,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9013733/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Information Science and Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s13755-022-00171-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/12/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

The ATHLOS cohort is composed of several harmonized datasets of international groups related to health and aging. As a result, the Healthy Aging index has been constructed based on a selection of variables from 16 individual studies. In this paper, we consider additional variables found in ATHLOS and investigate their utilization for predicting the Healthy Aging index. For this purpose, motivated by the volume and diversity of the dataset, we focus our attention upon data clustering, where unsupervised learning is utilized to enhance prediction power. Thus we show the predictive utility of exploiting hidden data structures. In addition, we demonstrate that imposed computation bottlenecks can be surpassed when using appropriate hierarchical clustering, within a clustering for ensemble classification scheme, while retaining prediction benefits. We propose a complete methodology that is evaluated against baseline methods and the original concept. The results are very encouraging suggesting further developments in this direction along with applications in tasks with similar characteristics. A straightforward open source implementation for the R project is also provided (https://github.com/Petros-Barmpas/HCEP).

Supplementary information: The online version contains supplementary material available at 10.1007/s13755-022-00171-1.

用于提高大规模人口研究中集合预测能力的分裂分层聚类方法:ATHLOS项目。
ATHLOS队列由几个与健康和老龄化相关的国际群体的协调数据集组成。因此,健康老龄化指数是基于从16项单独研究中选择的变量构建的。在本文中,我们考虑了ATHLOS中发现的其他变量,并研究了它们在预测健康老龄化指数中的应用。为此,由于数据集的数量和多样性,我们将注意力集中在数据聚类上,其中使用无监督学习来增强预测能力。因此,我们展示了利用隐藏数据结构的预测效用。此外,我们还证明,在集成分类方案的聚类中,使用适当的分层聚类可以超越强加的计算瓶颈,同时保留预测优势。我们提出了一个完整的方法,对基线方法和原始概念进行评估。结果非常令人鼓舞,表明在这一方向上的进一步发展以及在具有类似特征的任务中的应用。还提供了R项目的直接开源实现(https://github.com/Petros-Barmpas/HCEP).Supplementary信息:在线版本包含补充材料,可在10.1007/s13755-022-00171-1获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.30
自引率
5.00%
发文量
30
期刊介绍: Health Information Science and Systems is a multidisciplinary journal that integrates artificial intelligence/computer science/information technology with health science and services, embracing information science research coupled with topics related to the modeling, design, development, integration and management of health information systems, smart health, artificial intelligence in medicine, and computer aided diagnosis, medical expert systems. The scope includes: i.) smart health, artificial Intelligence in medicine, computer aided diagnosis, medical image processing, medical expert systems ii.) medical big data, medical/health/biomedicine information resources such as patient medical records, devices and equipments, software and tools to capture, store, retrieve, process, analyze, optimize the use of information in the health domain, iii.) data management, data mining, and knowledge discovery, all of which play a key role in decision making, management of public health, examination of standards, privacy and security issues, iv.) development of new architectures and applications for health information systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信