encan：跨多平台GWASs对迟发性阿尔茨海默病致病变异进行优先排序的综合评分。

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2025-03-04 DOI:10.1186/s13040-025-00436-x

Onur Erdogan, Cem Iyigun, Yeşim Aydın Son

{"title":"encan：跨多平台GWASs对迟发性阿尔茨海默病致病变异进行优先排序的综合评分。","authors":"Onur Erdogan, Cem Iyigun, Yeşim Aydın Son","doi":"10.1186/s13040-025-00436-x","DOIUrl":null,"url":null,"abstract":"Late-onset Alzheimer's disease (LOAD) is a progressive and complex neurodegenerative disorder of the aging population. LOAD is characterized by cognitive decline, such as deterioration of memory, loss of intellectual abilities, and other cognitive domains resulting from due to traumatic brain injuries. Alzheimer's Disease (AD) presents a complex genetic etiology that is still unclear, which limits its early or differential diagnosis. The Genome-Wide Association Studies (GWAS) enable the exploration of individual variants' statistical interactions at candidate loci, but univariate analysis overlooks interactions between variants. Machine learning (ML) algorithms can capture hidden, novel, and significant patterns while considering nonlinear interactions between variants to understand the genetic predisposition for complex genetic disorders. When working on different platforms, majority voting cannot be applied because the attributes differ. Hence, a new post-ML ensemble approach was developed to select significant SNVs via multiple genotyping platforms. We proposed the EnSCAN framework using a new algorithm to ensemble selected variants even from different platforms to prioritize candidate causative loci, which consequently helps improve ML results by combining the prior information captured from each dataset. The proposed ensemble algorithm utilizes the chromosomal locations of SNVs by mapping to cytogenetic bands, along with the proximities between pairs and multimodel Random Forest (RF) validations to prioritize SNVs and candidate causative genes for LOAD. The scoring method is scalable and can be applied to any multiplatform genotyping study. We present how the proposed EnSCAN scoring algorithm prioritizes candidate causative variants related to LOAD among three GWAS datasets.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"20"},"PeriodicalIF":6.1000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881353/pdf/","citationCount":"0","resultStr":"{\"title\":\"EnSCAN: ENsemble Scoring for prioritizing CAusative variaNts across multiplatform GWASs for late-onset alzheimer's disease.\",\"authors\":\"Onur Erdogan, Cem Iyigun, Yeşim Aydın Son\",\"doi\":\"10.1186/s13040-025-00436-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Late-onset Alzheimer's disease (LOAD) is a progressive and complex neurodegenerative disorder of the aging population. LOAD is characterized by cognitive decline, such as deterioration of memory, loss of intellectual abilities, and other cognitive domains resulting from due to traumatic brain injuries. Alzheimer's Disease (AD) presents a complex genetic etiology that is still unclear, which limits its early or differential diagnosis. The Genome-Wide Association Studies (GWAS) enable the exploration of individual variants' statistical interactions at candidate loci, but univariate analysis overlooks interactions between variants. Machine learning (ML) algorithms can capture hidden, novel, and significant patterns while considering nonlinear interactions between variants to understand the genetic predisposition for complex genetic disorders. When working on different platforms, majority voting cannot be applied because the attributes differ. Hence, a new post-ML ensemble approach was developed to select significant SNVs via multiple genotyping platforms. We proposed the EnSCAN framework using a new algorithm to ensemble selected variants even from different platforms to prioritize candidate causative loci, which consequently helps improve ML results by combining the prior information captured from each dataset. The proposed ensemble algorithm utilizes the chromosomal locations of SNVs by mapping to cytogenetic bands, along with the proximities between pairs and multimodel Random Forest (RF) validations to prioritize SNVs and candidate causative genes for LOAD. The scoring method is scalable and can be applied to any multiplatform genotyping study. We present how the proposed EnSCAN scoring algorithm prioritizes candidate causative variants related to LOAD among three GWAS datasets.\",\"PeriodicalId\":48947,\"journal\":{\"name\":\"Biodata Mining\",\"volume\":\"18 1\",\"pages\":\"20\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881353/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biodata Mining\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13040-025-00436-x\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00436-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

迟发性阿尔茨海默病（LOAD）是一种进行性和复杂的老年人群神经退行性疾病。负荷性脑损伤的特点是认知能力下降，如记忆退化、智力丧失和其他认知领域的丧失。阿尔茨海默病（AD）是一种复杂的遗传病因，目前尚不清楚，这限制了其早期或鉴别诊断。全基因组关联研究（GWAS）能够探索单个变异在候选基因座上的统计相互作用，但单变量分析忽略了变异之间的相互作用。机器学习（ML）算法可以捕捉隐藏的、新颖的和重要的模式，同时考虑变量之间的非线性相互作用，以了解复杂遗传疾病的遗传易感性。当在不同的平台上工作时，多数投票不能应用，因为属性不同。因此，研究人员开发了一种新的后ml集成方法，通过多个基因分型平台选择显著的snv。我们提出了EnSCAN框架，使用一种新的算法来集成来自不同平台的选定变体，以优先考虑候选致病位点，从而通过结合从每个数据集捕获的先验信息来帮助提高ML结果。所提出的集成算法利用snv的染色体位置，通过映射到细胞遗传带，以及对之间的接近度和多模型随机森林（RF）验证来优先考虑snv和候选LOAD致病基因。该评分方法具有可扩展性，可应用于任何多平台基因分型研究。我们介绍了所提出的EnSCAN评分算法如何在三个GWAS数据集中优先考虑与LOAD相关的候选致病变异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

EnSCAN: ENsemble Scoring for prioritizing CAusative variaNts across multiplatform GWASs for late-onset alzheimer's disease.

查看原文本刊更多论文

EnSCAN: ENsemble Scoring for prioritizing CAusative variaNts across multiplatform GWASs for late-onset alzheimer's disease.

Late-onset Alzheimer's disease (LOAD) is a progressive and complex neurodegenerative disorder of the aging population. LOAD is characterized by cognitive decline, such as deterioration of memory, loss of intellectual abilities, and other cognitive domains resulting from due to traumatic brain injuries. Alzheimer's Disease (AD) presents a complex genetic etiology that is still unclear, which limits its early or differential diagnosis. The Genome-Wide Association Studies (GWAS) enable the exploration of individual variants' statistical interactions at candidate loci, but univariate analysis overlooks interactions between variants. Machine learning (ML) algorithms can capture hidden, novel, and significant patterns while considering nonlinear interactions between variants to understand the genetic predisposition for complex genetic disorders. When working on different platforms, majority voting cannot be applied because the attributes differ. Hence, a new post-ML ensemble approach was developed to select significant SNVs via multiple genotyping platforms. We proposed the EnSCAN framework using a new algorithm to ensemble selected variants even from different platforms to prioritize candidate causative loci, which consequently helps improve ML results by combining the prior information captured from each dataset. The proposed ensemble algorithm utilizes the chromosomal locations of SNVs by mapping to cytogenetic bands, along with the proximities between pairs and multimodel Random Forest (RF) validations to prioritize SNVs and candidate causative genes for LOAD. The scoring method is scalable and can be applied to any multiplatform genotyping study. We present how the proposed EnSCAN scoring algorithm prioritizes candidate causative variants related to LOAD among three GWAS datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.