基于基因组变异和深度学习追溯地理种群的谱系起源。

IF 3.6 1区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Bing Yang , Xin Zhou , Shanlin Liu
{"title":"基于基因组变异和深度学习追溯地理种群的谱系起源。","authors":"Bing Yang ,&nbsp;Xin Zhou ,&nbsp;Shanlin Liu","doi":"10.1016/j.ympev.2024.108142","DOIUrl":null,"url":null,"abstract":"<div><p>Assigning a query individual animal or plant to its derived population is a prime task in diverse applications related to organismal genealogy. Such endeavors have conventionally relied on short DNA sequences under a phylogenetic framework. These methods naturally show constraints when the inferred population sources are ambiguously phylogenetically structured, a scenario demanding substantially more informative genetic signals. Recent advances in cost-effective production of whole-genome sequences and artificial intelligence have created an unprecedented opportunity to trace the population origin for essentially any given individual, as long as the genome reference data are comprehensive and standardized. Here, we developed a convolutional neural network method to identify population origins using genomic SNPs. Three empirical datasets (an Asian honeybee, a red fire ant, and a chicken datasets) and two simulated populations are used for the proof of concepts. The performance tests indicate that our method can accurately identify the genealogy origin of query individuals, with success rates ranging from  93 % to 100 %. We further showed that the accuracy of the model can be significantly increased by refining the informative sites through <span><math><mrow><msub><mi>F</mi><mrow><mi>ST</mi></mrow></msub></mrow></math></span> filtering. Our method is robust to configurations related to batch sizes and epochs, whereas model learning benefits from the setting of a proper preset learning rate. Moreover, we explained the importance score of key sites for algorithm interpretability and credibility, which has been largely ignored. We anticipate that by coupling genomics and deep learning, our method will see broad potential in conservation and management applications that involve natural resources, invasive pests and weeds, and illegal trades of wildlife products.</p></div>","PeriodicalId":56109,"journal":{"name":"Molecular Phylogenetics and Evolution","volume":"198 ","pages":"Article 108142"},"PeriodicalIF":3.6000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tracing the genealogy origin of geographic populations based on genomic variation and deep learning\",\"authors\":\"Bing Yang ,&nbsp;Xin Zhou ,&nbsp;Shanlin Liu\",\"doi\":\"10.1016/j.ympev.2024.108142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Assigning a query individual animal or plant to its derived population is a prime task in diverse applications related to organismal genealogy. Such endeavors have conventionally relied on short DNA sequences under a phylogenetic framework. These methods naturally show constraints when the inferred population sources are ambiguously phylogenetically structured, a scenario demanding substantially more informative genetic signals. Recent advances in cost-effective production of whole-genome sequences and artificial intelligence have created an unprecedented opportunity to trace the population origin for essentially any given individual, as long as the genome reference data are comprehensive and standardized. Here, we developed a convolutional neural network method to identify population origins using genomic SNPs. Three empirical datasets (an Asian honeybee, a red fire ant, and a chicken datasets) and two simulated populations are used for the proof of concepts. The performance tests indicate that our method can accurately identify the genealogy origin of query individuals, with success rates ranging from  93 % to 100 %. We further showed that the accuracy of the model can be significantly increased by refining the informative sites through <span><math><mrow><msub><mi>F</mi><mrow><mi>ST</mi></mrow></msub></mrow></math></span> filtering. Our method is robust to configurations related to batch sizes and epochs, whereas model learning benefits from the setting of a proper preset learning rate. Moreover, we explained the importance score of key sites for algorithm interpretability and credibility, which has been largely ignored. We anticipate that by coupling genomics and deep learning, our method will see broad potential in conservation and management applications that involve natural resources, invasive pests and weeds, and illegal trades of wildlife products.</p></div>\",\"PeriodicalId\":56109,\"journal\":{\"name\":\"Molecular Phylogenetics and Evolution\",\"volume\":\"198 \",\"pages\":\"Article 108142\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Phylogenetics and Evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1055790324001349\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Phylogenetics and Evolution","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1055790324001349","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

将查询到的动物或植物个体归入其衍生种群是与生物系谱相关的各种应用中的首要任务。这些工作通常依赖于系统发育框架下的短 DNA 序列。当推断出的种群来源在系统发育结构上含糊不清时,这些方法自然会受到限制,这种情况下就需要信息量更大的遗传信号。最近,全基因组序列的生产和人工智能在成本效益方面取得的进展,为追溯任何给定个体的种群起源创造了前所未有的机会,只要基因组参考数据是全面和标准化的。在这里,我们开发了一种卷积神经网络方法,利用基因组 SNPs 来识别种群起源。我们使用了三个经验数据集(亚洲蜜蜂、红火蚁和鸡数据集)和两个模拟种群进行概念验证。性能测试表明,我们的方法可以准确识别查询个体的系谱起源,成功率从 > 93 % 到 100 % 不等。我们还进一步证明,通过 FST 过滤提炼信息位点,可以显著提高模型的准确性。我们的方法对与批量大小和历时相关的配置具有鲁棒性,而模型学习则得益于适当预设学习率的设置。此外,我们还解释了关键位点对于算法可解释性和可信度的重要性得分,而这一点在很大程度上被忽视了。我们预计,通过将基因组学与深度学习相结合,我们的方法将在涉及自然资源、入侵害虫和杂草以及野生动物产品非法贸易的保护和管理应用中发挥广泛的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Tracing the genealogy origin of geographic populations based on genomic variation and deep learning

Tracing the genealogy origin of geographic populations based on genomic variation and deep learning

Assigning a query individual animal or plant to its derived population is a prime task in diverse applications related to organismal genealogy. Such endeavors have conventionally relied on short DNA sequences under a phylogenetic framework. These methods naturally show constraints when the inferred population sources are ambiguously phylogenetically structured, a scenario demanding substantially more informative genetic signals. Recent advances in cost-effective production of whole-genome sequences and artificial intelligence have created an unprecedented opportunity to trace the population origin for essentially any given individual, as long as the genome reference data are comprehensive and standardized. Here, we developed a convolutional neural network method to identify population origins using genomic SNPs. Three empirical datasets (an Asian honeybee, a red fire ant, and a chicken datasets) and two simulated populations are used for the proof of concepts. The performance tests indicate that our method can accurately identify the genealogy origin of query individuals, with success rates ranging from  93 % to 100 %. We further showed that the accuracy of the model can be significantly increased by refining the informative sites through FST filtering. Our method is robust to configurations related to batch sizes and epochs, whereas model learning benefits from the setting of a proper preset learning rate. Moreover, we explained the importance score of key sites for algorithm interpretability and credibility, which has been largely ignored. We anticipate that by coupling genomics and deep learning, our method will see broad potential in conservation and management applications that involve natural resources, invasive pests and weeds, and illegal trades of wildlife products.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Phylogenetics and Evolution
Molecular Phylogenetics and Evolution 生物-进化生物学
CiteScore
7.50
自引率
7.30%
发文量
249
审稿时长
7.5 months
期刊介绍: Molecular Phylogenetics and Evolution is dedicated to bringing Darwin''s dream within grasp - to "have fairly true genealogical trees of each great kingdom of Nature." The journal provides a forum for molecular studies that advance our understanding of phylogeny and evolution, further the development of phylogenetically more accurate taxonomic classifications, and ultimately bring a unified classification for all the ramifying lines of life. Phylogeographic studies will be considered for publication if they offer EXCEPTIONAL theoretical or empirical advances.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信