Diagnosing migraine from genome-wide genotype data: a machine learning analysis.

IF 10.6 1区 医学 Q1 CLINICAL NEUROLOGY
Brain Pub Date : 2025-05-06 DOI:10.1093/brain/awaf172
Antonios Danelakis,Tjaša Kumelj,Bendik S Winsvold,Marte Helene Bjørk,Parashkev Nachev,Manjit Matharu,Dominic Giles,,Erling Tronvik,Helge Langseth,Anker Stubberud
{"title":"Diagnosing migraine from genome-wide genotype data: a machine learning analysis.","authors":"Antonios Danelakis,Tjaša Kumelj,Bendik S Winsvold,Marte Helene Bjørk,Parashkev Nachev,Manjit Matharu,Dominic Giles,,Erling Tronvik,Helge Langseth,Anker Stubberud","doi":"10.1093/brain/awaf172","DOIUrl":null,"url":null,"abstract":"Migraine has an assumed polygenic basis, but the genetic risk variants identified in genome-wide association studies only explain a proportion of the heritability. We aimed to develop machine learning models, capturing non-additive and interactive effects, to address the missing heritability. This was a cross-sectional population-based study of participants in the second and third Trøndelag Health Study. Individuals underwent genome-wide genotyping and were phenotyped based on validated modified criteria of the International Classification of Headache Disorders. Four datasets of increasing number of genetic variants were created using different thresholds of linkage disequilibrium and univariate genome-wide associated p-values. A series of machine learning and deep learning methods were optimized and evaluated. The genotype tools PLINK and LDPred2 were used for polygenic risk scoring. Models were trained on a partition of the dataset and tested in a hold-out set. The area under the receiver operating characteristics curve was used as the primary scoring metric. Classification by machine learning was statistically compared to that of polygenic risk scoring. Finally, we explored the biological functions of the variants unique to the machine learning approach. 43,197 individuals (51% women), with a mean age of 54.6 years, were included in the modelling. A light gradient boosting machine performed best for the three smallest datasets (108, 7,771 and 7,840 variants), all with hold-out test set area under curve at 0.63. A multinomial naïve Bayes model performed best in the largest dataset (140,467 variants) with a hold-out test set area under curve of 0.62. The models were statistically significantly superior to polygenic risk scoring (area under curve 0.52 to 0.59) for all the datasets (p<0.001 to p=0.02). Machine learning identified many of the same genes and pathways identified in genome-wide association studies, but also several unique pathways, mainly related to signal transduction and neurological function. Interestingly, pathways related to botulinum toxins, and pathways related to the calcitonin gene-related peptide receptor also emerged. This study suggests that migraine may follow a non-additive and interactive genetic causal structure, potentially best captured by complex machine learning models. Such structure may be concealed where the data dimensionality (high number of genetic variants) is insufficiently supported by the scale of available data, leaving a misleading impression of purely additive effects. Future machine learning models using substantially larger sample sizes could harness both the additive and the interactive effects, enhancing precision and offering deeper understanding of genetic interactions underlying migraine.","PeriodicalId":9063,"journal":{"name":"Brain","volume":"26 1","pages":""},"PeriodicalIF":10.6000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brain","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/brain/awaf172","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Migraine has an assumed polygenic basis, but the genetic risk variants identified in genome-wide association studies only explain a proportion of the heritability. We aimed to develop machine learning models, capturing non-additive and interactive effects, to address the missing heritability. This was a cross-sectional population-based study of participants in the second and third Trøndelag Health Study. Individuals underwent genome-wide genotyping and were phenotyped based on validated modified criteria of the International Classification of Headache Disorders. Four datasets of increasing number of genetic variants were created using different thresholds of linkage disequilibrium and univariate genome-wide associated p-values. A series of machine learning and deep learning methods were optimized and evaluated. The genotype tools PLINK and LDPred2 were used for polygenic risk scoring. Models were trained on a partition of the dataset and tested in a hold-out set. The area under the receiver operating characteristics curve was used as the primary scoring metric. Classification by machine learning was statistically compared to that of polygenic risk scoring. Finally, we explored the biological functions of the variants unique to the machine learning approach. 43,197 individuals (51% women), with a mean age of 54.6 years, were included in the modelling. A light gradient boosting machine performed best for the three smallest datasets (108, 7,771 and 7,840 variants), all with hold-out test set area under curve at 0.63. A multinomial naïve Bayes model performed best in the largest dataset (140,467 variants) with a hold-out test set area under curve of 0.62. The models were statistically significantly superior to polygenic risk scoring (area under curve 0.52 to 0.59) for all the datasets (p<0.001 to p=0.02). Machine learning identified many of the same genes and pathways identified in genome-wide association studies, but also several unique pathways, mainly related to signal transduction and neurological function. Interestingly, pathways related to botulinum toxins, and pathways related to the calcitonin gene-related peptide receptor also emerged. This study suggests that migraine may follow a non-additive and interactive genetic causal structure, potentially best captured by complex machine learning models. Such structure may be concealed where the data dimensionality (high number of genetic variants) is insufficiently supported by the scale of available data, leaving a misleading impression of purely additive effects. Future machine learning models using substantially larger sample sizes could harness both the additive and the interactive effects, enhancing precision and offering deeper understanding of genetic interactions underlying migraine.
从全基因组基因型数据诊断偏头痛:机器学习分析。
偏头痛有一个假定的多基因基础,但是在全基因组关联研究中发现的遗传风险变异只解释了遗传能力的一部分。我们的目标是开发机器学习模型,捕捉非加性和交互效应,以解决缺失的遗传性。这是一项以人群为基础的横断面研究,研究对象是第二和第三项Trøndelag健康研究的参与者。个体进行全基因组基因分型,并根据国际头痛疾病分类的经过验证的修改标准进行表型分型。使用不同的连锁不平衡阈值和单变量全基因组相关p值创建了四个遗传变异数量不断增加的数据集。对一系列机器学习和深度学习方法进行了优化和评估。使用基因型工具PLINK和LDPred2进行多基因风险评分。模型在数据集的一个分区上进行训练,并在一个保留集中进行测试。受试者工作特征曲线下面积作为主要评分指标。将机器学习分类与多基因风险评分进行统计比较。最后,我们探索了机器学习方法特有的变体的生物学功能。43,197人(51%为女性),平均年龄为54.6岁,被纳入建模。轻梯度增强机器在三个最小的数据集(108,7771和7840个变体)上表现最好,所有测试集的曲线下面积都为0.63。多项式naïve贝叶斯模型在最大的数据集(140,467个变量)中表现最好,曲线下的保留测试集面积为0.62。所有数据集的模型均显著优于多基因风险评分(曲线下面积0.52 ~ 0.59)(p<0.001 ~ p=0.02)。机器学习发现了许多与全基因组关联研究中发现的相同的基因和途径,但也发现了一些独特的途径,主要与信号转导和神经功能有关。有趣的是,与肉毒杆菌毒素相关的途径,以及与降钙素基因相关肽受体相关的途径也出现了。这项研究表明,偏头痛可能遵循一种非加性的、相互作用的遗传因果结构,这种结构可能最好由复杂的机器学习模型来捕捉。这样的结构可能会被隐藏在数据维度(大量的遗传变异)没有得到足够的可用数据支持的地方,留下纯粹加性效应的误导印象。未来使用更大样本量的机器学习模型可以同时利用加法和交互效应,提高精度,并更深入地了解偏头痛背后的遗传相互作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Brain
Brain 医学-临床神经学
CiteScore
20.30
自引率
4.10%
发文量
458
审稿时长
3-6 weeks
期刊介绍: Brain, a journal focused on clinical neurology and translational neuroscience, has been publishing landmark papers since 1878. The journal aims to expand its scope by including studies that shed light on disease mechanisms and conducting innovative clinical trials for brain disorders. With a wide range of topics covered, the Editorial Board represents the international readership and diverse coverage of the journal. Accepted articles are promptly posted online, typically within a few weeks of acceptance. As of 2022, Brain holds an impressive impact factor of 14.5, according to the Journal Citation Reports.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信