Popfinder:一个高效的遗传种群分配人工神经网络包。

IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
K Birchard, C Boccia, H Lounder, L Colston-Nepali, V L Friesen
{"title":"Popfinder:一个高效的遗传种群分配人工神经网络包。","authors":"K Birchard, C Boccia, H Lounder, L Colston-Nepali, V L Friesen","doi":"10.1111/1755-0998.14096","DOIUrl":null,"url":null,"abstract":"<p><p>The ability to assign biological samples to source populations with high accuracy and precision based on genetic variation is important for numerous applications from ecological studies through wildlife conservation to epidemiology. However, population assignment when genetic differentiation is low is challenging, and methods to address this problem are lacking. The application of artificial neural networks to population assignment using genomic data is highly promising. Here we present popfinder: a new, easy-to-use Python-based artificial neural network pipeline for genetic population assignment. We tested popfinder both with simulated genetic data from populations connected by varying levels of gene flow and with reduced-representation sequence data for three species of seabirds with weak to no population genetic structure. Popfinder was able to assign individuals to their source populations with high accuracy, precision and recall in most cases, including both simulated and empirical data sets, except in the empirical data set with the weakest population structure, where the comparator programs also performed poorly. Compared to other available software, popfinder was slower on the simulated data sets due to hyperparameter tuning and the fact that it does not reduce the dimensionality of the data set; however, all programs ran in seconds on empirical data sets. Additionally, popfinder provides a perturbation ranking method to help develop optimised SNP panels for genetic population assignment and is designed to be user-friendly. Finally, we caution users of all assignment programs to watch both for leakage of data during model training, which can lead to overfitting and inflation of performance metrics, and for unequal detection probabilities.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e14096"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Popfinder: A Highly Effective Artificial Neural Network Package for Genetic Population Assignment.\",\"authors\":\"K Birchard, C Boccia, H Lounder, L Colston-Nepali, V L Friesen\",\"doi\":\"10.1111/1755-0998.14096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The ability to assign biological samples to source populations with high accuracy and precision based on genetic variation is important for numerous applications from ecological studies through wildlife conservation to epidemiology. However, population assignment when genetic differentiation is low is challenging, and methods to address this problem are lacking. The application of artificial neural networks to population assignment using genomic data is highly promising. Here we present popfinder: a new, easy-to-use Python-based artificial neural network pipeline for genetic population assignment. We tested popfinder both with simulated genetic data from populations connected by varying levels of gene flow and with reduced-representation sequence data for three species of seabirds with weak to no population genetic structure. Popfinder was able to assign individuals to their source populations with high accuracy, precision and recall in most cases, including both simulated and empirical data sets, except in the empirical data set with the weakest population structure, where the comparator programs also performed poorly. Compared to other available software, popfinder was slower on the simulated data sets due to hyperparameter tuning and the fact that it does not reduce the dimensionality of the data set; however, all programs ran in seconds on empirical data sets. Additionally, popfinder provides a perturbation ranking method to help develop optimised SNP panels for genetic population assignment and is designed to be user-friendly. Finally, we caution users of all assignment programs to watch both for leakage of data during model training, which can lead to overfitting and inflation of performance metrics, and for unequal detection probabilities.</p>\",\"PeriodicalId\":211,\"journal\":{\"name\":\"Molecular Ecology Resources\",\"volume\":\" \",\"pages\":\"e14096\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Ecology Resources\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1111/1755-0998.14096\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.14096","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

在遗传变异的基础上,高精度地将生物样本分配到源种群的能力对于从生态学研究到野生动物保护到流行病学的许多应用都很重要。然而,当遗传分化较低时,群体分配是具有挑战性的,并且缺乏解决这一问题的方法。人工神经网络在基因组数据种群分配中的应用前景十分广阔。在这里,我们提出了popfinder:一个新的,易于使用的基于python的人工神经网络管道遗传种群分配。我们对popfinder进行了测试,使用了不同水平基因流连接的种群的模拟遗传数据,以及三种种群遗传结构弱或无种群遗传结构的海鸟的简化表示序列数据。在大多数情况下,包括模拟数据集和经验数据集,Popfinder能够以较高的准确性、精度和召回率将个体分配到其源种群中,除了具有最弱种群结构的经验数据集,比较程序也表现不佳。与其他可用的软件相比,popfinder在模拟数据集上的速度较慢,这是由于超参数调优,而且它没有降低数据集的维数;然而,在经验数据集上,所有程序都可以在几秒钟内运行。此外,popfinder提供了一种扰动排序方法,以帮助开发优化的SNP面板,用于遗传群体分配,并且设计得非常友好。最后,我们提醒所有分配程序的用户注意模型训练期间的数据泄漏,这可能导致性能指标的过拟合和膨胀,以及不相等的检测概率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Popfinder: A Highly Effective Artificial Neural Network Package for Genetic Population Assignment.

The ability to assign biological samples to source populations with high accuracy and precision based on genetic variation is important for numerous applications from ecological studies through wildlife conservation to epidemiology. However, population assignment when genetic differentiation is low is challenging, and methods to address this problem are lacking. The application of artificial neural networks to population assignment using genomic data is highly promising. Here we present popfinder: a new, easy-to-use Python-based artificial neural network pipeline for genetic population assignment. We tested popfinder both with simulated genetic data from populations connected by varying levels of gene flow and with reduced-representation sequence data for three species of seabirds with weak to no population genetic structure. Popfinder was able to assign individuals to their source populations with high accuracy, precision and recall in most cases, including both simulated and empirical data sets, except in the empirical data set with the weakest population structure, where the comparator programs also performed poorly. Compared to other available software, popfinder was slower on the simulated data sets due to hyperparameter tuning and the fact that it does not reduce the dimensionality of the data set; however, all programs ran in seconds on empirical data sets. Additionally, popfinder provides a perturbation ranking method to help develop optimised SNP panels for genetic population assignment and is designed to be user-friendly. Finally, we caution users of all assignment programs to watch both for leakage of data during model training, which can lead to overfitting and inflation of performance metrics, and for unequal detection probabilities.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Ecology Resources
Molecular Ecology Resources 生物-进化生物学
CiteScore
15.60
自引率
5.20%
发文量
170
审稿时长
3 months
期刊介绍: Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines. In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信