基因组异常检测与功能数据分析。

IF 2.8 3区 生物学 Q2 GENETICS & HEREDITY
Genes Pub Date : 2025-06-15 DOI:10.3390/genes16060710
Ria Kanjilal, Andre Luiz Campelo Dos Santos, Sandipan Paul Arnab, Michael DeGiorgio, Raquel Assis
{"title":"基因组异常检测与功能数据分析。","authors":"Ria Kanjilal, Andre Luiz Campelo Dos Santos, Sandipan Paul Arnab, Michael DeGiorgio, Raquel Assis","doi":"10.3390/genes16060710","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Genetic variation provides a foundation for understanding evolution. With the rise of artificial intelligence, machine learning has emerged as a powerful tool for identifying genomic footprints of evolutionary processes through simulation-based predictive modeling. However, existing approaches require prior knowledge of the factors shaping genetic variation, whereas uncovering anomalous genomic regions regardless of their causes remains an equally important and complementary endeavor. <b>Methods:</b> To address this problem, we introduce ANDES (ANomaly DEtection using Summary statistics), a suite of algorithms that apply statistical techniques to extract features for unsupervised anomaly detection. A key innovation of ANDES is its ability to account for autocovariation due to linkage disequilibrium by fitting curves to contiguous windows and computing their first and second derivatives, thereby capturing the \"velocity\" and \"acceleration\" of genetic variation. These features are then used to train models that flag biologically significant or artifactual regions. <b>Results:</b> Application to human genomic data demonstrates that ANDES successfully detects anomalous regions that colocalize with genes under positive or balancing selection. Moreover, these analyses reveal a non-uniform distribution of anomalies, which are enriched in specific autosomes, intergenic regions, introns, and regions with low GC content, repetitive sequences, and poor mappability. <b>Conclusions:</b> ANDES thus offers a novel, model-agnostic framework for uncovering anomalous genomic regions in both model and non-model organisms.</p>","PeriodicalId":12688,"journal":{"name":"Genes","volume":"16 6","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192579/pdf/","citationCount":"0","resultStr":"{\"title\":\"Genomic Anomaly Detection with Functional Data Analysis.\",\"authors\":\"Ria Kanjilal, Andre Luiz Campelo Dos Santos, Sandipan Paul Arnab, Michael DeGiorgio, Raquel Assis\",\"doi\":\"10.3390/genes16060710\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background:</b> Genetic variation provides a foundation for understanding evolution. With the rise of artificial intelligence, machine learning has emerged as a powerful tool for identifying genomic footprints of evolutionary processes through simulation-based predictive modeling. However, existing approaches require prior knowledge of the factors shaping genetic variation, whereas uncovering anomalous genomic regions regardless of their causes remains an equally important and complementary endeavor. <b>Methods:</b> To address this problem, we introduce ANDES (ANomaly DEtection using Summary statistics), a suite of algorithms that apply statistical techniques to extract features for unsupervised anomaly detection. A key innovation of ANDES is its ability to account for autocovariation due to linkage disequilibrium by fitting curves to contiguous windows and computing their first and second derivatives, thereby capturing the \\\"velocity\\\" and \\\"acceleration\\\" of genetic variation. These features are then used to train models that flag biologically significant or artifactual regions. <b>Results:</b> Application to human genomic data demonstrates that ANDES successfully detects anomalous regions that colocalize with genes under positive or balancing selection. Moreover, these analyses reveal a non-uniform distribution of anomalies, which are enriched in specific autosomes, intergenic regions, introns, and regions with low GC content, repetitive sequences, and poor mappability. <b>Conclusions:</b> ANDES thus offers a novel, model-agnostic framework for uncovering anomalous genomic regions in both model and non-model organisms.</p>\",\"PeriodicalId\":12688,\"journal\":{\"name\":\"Genes\",\"volume\":\"16 6\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192579/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genes\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3390/genes16060710\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genes","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/genes16060710","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

背景:遗传变异是理解进化的基础。随着人工智能的兴起,机器学习已经成为一种强大的工具,通过基于模拟的预测建模来识别进化过程的基因组足迹。然而,现有的方法需要事先了解形成遗传变异的因素,而发现异常基因组区域而不考虑其原因仍然是同样重要和互补的努力。方法:为了解决这个问题,我们引入了ANDES(使用摘要统计的异常检测),这是一套应用统计技术提取特征进行无监督异常检测的算法。ANDES的一个关键创新是它能够通过将曲线拟合到连续窗口并计算其一阶和二阶导数来解释由于连锁不平衡引起的自共变,从而捕获遗传变异的“速度”和“加速度”。然后,这些特征被用来训练标记生物学上重要或人工区域的模型。结果:对人类基因组数据的应用表明,ANDES成功地检测到与正选择或平衡选择下的基因共定位的异常区域。此外,这些分析还揭示了异常的非均匀分布,这些异常丰富于特定常染色体、基因间区域、内含子、低GC含量区域、重复序列和较差的映射性。结论:安第斯因此提供了一个新的,模式不可知的框架,揭示异常基因组区域在模型和非模式生物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Genomic Anomaly Detection with Functional Data Analysis.

Background: Genetic variation provides a foundation for understanding evolution. With the rise of artificial intelligence, machine learning has emerged as a powerful tool for identifying genomic footprints of evolutionary processes through simulation-based predictive modeling. However, existing approaches require prior knowledge of the factors shaping genetic variation, whereas uncovering anomalous genomic regions regardless of their causes remains an equally important and complementary endeavor. Methods: To address this problem, we introduce ANDES (ANomaly DEtection using Summary statistics), a suite of algorithms that apply statistical techniques to extract features for unsupervised anomaly detection. A key innovation of ANDES is its ability to account for autocovariation due to linkage disequilibrium by fitting curves to contiguous windows and computing their first and second derivatives, thereby capturing the "velocity" and "acceleration" of genetic variation. These features are then used to train models that flag biologically significant or artifactual regions. Results: Application to human genomic data demonstrates that ANDES successfully detects anomalous regions that colocalize with genes under positive or balancing selection. Moreover, these analyses reveal a non-uniform distribution of anomalies, which are enriched in specific autosomes, intergenic regions, introns, and regions with low GC content, repetitive sequences, and poor mappability. Conclusions: ANDES thus offers a novel, model-agnostic framework for uncovering anomalous genomic regions in both model and non-model organisms.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genes
Genes GENETICS & HEREDITY-
CiteScore
5.20
自引率
5.70%
发文量
1975
审稿时长
22.94 days
期刊介绍: Genes (ISSN 2073-4425) is an international, peer-reviewed open access journal which provides an advanced forum for studies related to genes, genetics and genomics. It publishes reviews, research articles, communications and technical notes. There is no restriction on the length of the papers and we encourage scientists to publish their results in as much detail as possible.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信