Matteo Leghissa, Álvaro Carrera, Carlos Á. Iglesias
{"title":"FRELSA:源自 ELSA 的老年人虚弱数据集,通过机器学习模型进行评估","authors":"Matteo Leghissa, Álvaro Carrera, Carlos Á. Iglesias","doi":"10.1016/j.ijmedinf.2024.105603","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Frailty is an age-related syndrome characterized by loss of strength and exhaustion and associated with multi-morbidity. Early detection and prediction of the appearance of frailty could help older people age better and prevent them from needing invasive and expensive treatments. Machine learning techniques show promising results in creating a medical support tool for such a task.</p></div><div><h3>Methods</h3><p>This study aims to create a dataset for machine learning-based frailty studies, using Fried's Frailty Phenotype definition. Starting from a longitudinal study on aging in the UK population, we defined a frailty label for each subject. We evaluated the definition by training seven different models for detecting frailty with data that were contemporary to the ones used for the definition. We then integrated more data from two years before to obtain prediction models with a 24-month horizon. Features selection was performed using the MultiSURF algorithm, which ranks all features in order of relevance to the detection or prediction task.</p></div><div><h3>Results</h3><p>We present a new frailty dataset of 5303 subjects and more than 6500 available features. It is publicly available, provided one has access to the original English Longitudinal Study of Ageing dataset. The dataset is balanced after grouping frailty with pre-frailty, and it is suitable for multiclass or binary classification and prediction problems. The seven tested architectures performed similarly, forming a solid baseline that can be improved with future work. Linear regression achieved the best F-score and AUROC in detection and prediction tasks.</p></div><div><h3>Conclusions</h3><p>Creating new frailty-annotated datasets of this size is necessary to develop and improve the frailty prediction techniques. We have shown that our dataset can be used to study and test machine learning models to detect and predict frailty. Future work should improve models' architecture and performance, consider explainability, and possibly enrich the dataset with older waves.</p></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"192 ","pages":"Article 105603"},"PeriodicalIF":3.7000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1386505624002661/pdfft?md5=9df88b7adefbcc2789a1bdabf89eed8b&pid=1-s2.0-S1386505624002661-main.pdf","citationCount":"0","resultStr":"{\"title\":\"FRELSA: A dataset for frailty in elderly people originated from ELSA and evaluated through machine learning models\",\"authors\":\"Matteo Leghissa, Álvaro Carrera, Carlos Á. Iglesias\",\"doi\":\"10.1016/j.ijmedinf.2024.105603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Frailty is an age-related syndrome characterized by loss of strength and exhaustion and associated with multi-morbidity. Early detection and prediction of the appearance of frailty could help older people age better and prevent them from needing invasive and expensive treatments. Machine learning techniques show promising results in creating a medical support tool for such a task.</p></div><div><h3>Methods</h3><p>This study aims to create a dataset for machine learning-based frailty studies, using Fried's Frailty Phenotype definition. Starting from a longitudinal study on aging in the UK population, we defined a frailty label for each subject. We evaluated the definition by training seven different models for detecting frailty with data that were contemporary to the ones used for the definition. We then integrated more data from two years before to obtain prediction models with a 24-month horizon. Features selection was performed using the MultiSURF algorithm, which ranks all features in order of relevance to the detection or prediction task.</p></div><div><h3>Results</h3><p>We present a new frailty dataset of 5303 subjects and more than 6500 available features. It is publicly available, provided one has access to the original English Longitudinal Study of Ageing dataset. The dataset is balanced after grouping frailty with pre-frailty, and it is suitable for multiclass or binary classification and prediction problems. The seven tested architectures performed similarly, forming a solid baseline that can be improved with future work. Linear regression achieved the best F-score and AUROC in detection and prediction tasks.</p></div><div><h3>Conclusions</h3><p>Creating new frailty-annotated datasets of this size is necessary to develop and improve the frailty prediction techniques. We have shown that our dataset can be used to study and test machine learning models to detect and predict frailty. Future work should improve models' architecture and performance, consider explainability, and possibly enrich the dataset with older waves.</p></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"192 \",\"pages\":\"Article 105603\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1386505624002661/pdfft?md5=9df88b7adefbcc2789a1bdabf89eed8b&pid=1-s2.0-S1386505624002661-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624002661\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624002661","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
背景虚弱是一种与年龄有关的综合征,其特点是体力下降和精疲力竭,并伴有多种疾病。早期检测和预测虚弱的出现可以帮助老年人更好地安享晚年,避免他们需要接受昂贵的侵入性治疗。机器学习技术在为此类任务创建医疗支持工具方面取得了可喜的成果。本研究旨在利用弗里德的虚弱表型定义,为基于机器学习的虚弱研究创建一个数据集。从英国人口老龄化纵向研究开始,我们为每个受试者定义了一个虚弱标签。我们使用与定义所使用的数据类似的数据训练了七个不同的虚弱检测模型,对定义进行了评估。然后,我们整合了两年前的更多数据,得到了 24 个月的预测模型。特征选择采用 MultiSURF 算法,该算法将所有特征按照与检测或预测任务的相关性进行排序。只要能访问原始的英国老龄化纵向研究数据集,就能公开获得该数据集。该数据集在将虚弱与前期虚弱分组后达到了平衡,适用于多类或二元分类和预测问题。七个经过测试的架构表现类似,形成了一个坚实的基线,可以在今后的工作中加以改进。线性回归在检测和预测任务中取得了最佳的 F 分数和 AUROC。我们已经证明,我们的数据集可用于研究和测试检测和预测虚弱的机器学习模型。未来的工作应该改进模型的结构和性能,考虑可解释性,并在可能的情况下用更老的波来丰富数据集。
FRELSA: A dataset for frailty in elderly people originated from ELSA and evaluated through machine learning models
Background
Frailty is an age-related syndrome characterized by loss of strength and exhaustion and associated with multi-morbidity. Early detection and prediction of the appearance of frailty could help older people age better and prevent them from needing invasive and expensive treatments. Machine learning techniques show promising results in creating a medical support tool for such a task.
Methods
This study aims to create a dataset for machine learning-based frailty studies, using Fried's Frailty Phenotype definition. Starting from a longitudinal study on aging in the UK population, we defined a frailty label for each subject. We evaluated the definition by training seven different models for detecting frailty with data that were contemporary to the ones used for the definition. We then integrated more data from two years before to obtain prediction models with a 24-month horizon. Features selection was performed using the MultiSURF algorithm, which ranks all features in order of relevance to the detection or prediction task.
Results
We present a new frailty dataset of 5303 subjects and more than 6500 available features. It is publicly available, provided one has access to the original English Longitudinal Study of Ageing dataset. The dataset is balanced after grouping frailty with pre-frailty, and it is suitable for multiclass or binary classification and prediction problems. The seven tested architectures performed similarly, forming a solid baseline that can be improved with future work. Linear regression achieved the best F-score and AUROC in detection and prediction tasks.
Conclusions
Creating new frailty-annotated datasets of this size is necessary to develop and improve the frailty prediction techniques. We have shown that our dataset can be used to study and test machine learning models to detect and predict frailty. Future work should improve models' architecture and performance, consider explainability, and possibly enrich the dataset with older waves.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.