Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience.

ArXiv Pub Date : 2024-09-13
Rongguang Wang, Guray Erus, Pratik Chaudhari, Christos Davatzikos
{"title":"Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience.","authors":"Rongguang Wang, Guray Erus, Pratik Chaudhari, Christos Davatzikos","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning (ML) is revolutionizing many areas of engineering and science, including healthcare. However, it is also facing a reproducibility crisis, especially in healthcare. ML models that are carefully constructed from and evaluated on data from one part of the population may not generalize well on data from a different population group, or acquisition instrument settings and acquisition protocols. We tackle this problem in the context of neuroimaging of Alzheimer's disease (AD), schizophrenia (SZ) and brain aging. We develop a weighted empirical risk minimization approach that optimally combines data from a source group, e.g., subjects are stratified by attributes such as sex, age group, race and clinical cohort to make predictions on a target group, e.g., other sex, age group, etc. using a small fraction (10%) of data from the target group. We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of AD and SZ, and estimation of brain age. We found that this approach achieves substantially better accuracy than existing domain adaptation techniques: it obtains area under curve greater than 0.95 for AD classification, area under curve greater than 0.7 for SZ classification and mean absolute error less than 5 years for brain age prediction on all target groups, achieving robustness to variations of scanners, protocols, and demographic or clinical characteristics. In some cases, it is even better than training on all data from the target group, because it leverages the diversity and size of a larger training set. We also demonstrate the utility of our models for prognostic tasks such as predicting disease progression in individuals with mild cognitive impairment. Critically, our brain age prediction models lead to new clinical insights regarding correlations with neurophysiological tests. In summary, we present a relatively simple methodology, along with ample experimental evidence, supporting the good generalization of ML models to new datasets and patient cohorts.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419182/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) is revolutionizing many areas of engineering and science, including healthcare. However, it is also facing a reproducibility crisis, especially in healthcare. ML models that are carefully constructed from and evaluated on data from one part of the population may not generalize well on data from a different population group, or acquisition instrument settings and acquisition protocols. We tackle this problem in the context of neuroimaging of Alzheimer's disease (AD), schizophrenia (SZ) and brain aging. We develop a weighted empirical risk minimization approach that optimally combines data from a source group, e.g., subjects are stratified by attributes such as sex, age group, race and clinical cohort to make predictions on a target group, e.g., other sex, age group, etc. using a small fraction (10%) of data from the target group. We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of AD and SZ, and estimation of brain age. We found that this approach achieves substantially better accuracy than existing domain adaptation techniques: it obtains area under curve greater than 0.95 for AD classification, area under curve greater than 0.7 for SZ classification and mean absolute error less than 5 years for brain age prediction on all target groups, achieving robustness to variations of scanners, protocols, and demographic or clinical characteristics. In some cases, it is even better than training on all data from the target group, because it leverages the diversity and size of a larger training set. We also demonstrate the utility of our models for prognostic tasks such as predicting disease progression in individuals with mild cognitive impairment. Critically, our brain age prediction models lead to new clinical insights regarding correlations with neurophysiological tests. In summary, we present a relatively simple methodology, along with ample experimental evidence, supporting the good generalization of ML models to new datasets and patient cohorts.

使用少量数据使机器学习诊断模型适应新人群:临床神经科学的研究成果。
机器学习(ML)为包括医疗保健在内的多个领域带来了巨大的变革前景。然而,它也面临着可重复性危机,尤其是在医学领域。根据训练集精心构建和评估的 ML 模型,可能无法很好地泛化来自不同患者群体或采集仪器设置和协议的数据。我们以阿尔茨海默病(AD)、精神分裂症(SZ)和脑衰老的神经成像为背景来解决这个问题。我们开发了一种加权经验风险最小化方法,该方法可优化组合来自源群体的数据,例如按性别、年龄组、种族和临床队列等属性对受试者进行分层,从而利用来自目标群体的一小部分(10%)数据对目标群体(例如其他性别、年龄组等)进行预测。我们将这种方法应用于来自 20 项神经影像研究的 15,363 个个体的多源数据,建立了用于诊断 AD 和 SZ 以及估算脑年龄的 ML 模型。我们发现,这种方法比现有的领域适应技术获得了更高的准确性:它对 AD 分类的曲线下面积大于 0.95,对 SZ 分类的曲线下面积大于 0.7,对所有目标群体的脑年龄预测的平均绝对误差小于 5 岁,实现了对扫描仪、协议、人口统计或临床特征变化的鲁棒性。在某些情况下,它甚至比在目标群体的所有数据上进行训练更好,因为它充分利用了更大训练集的多样性和规模。我们还证明了我们的模型在预后任务中的实用性,如预测轻度认知障碍患者的疾病进展。重要的是,我们的脑年龄预测模型在与神经生理学测试的相关性方面带来了新的临床见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信