GenoTools:用于高效基因型数据质量控制和分析的开源 Python 软件包。

IF 2.1 3区 生物学 Q3 GENETICS & HEREDITY
Dan Vitale, Mathew J Koretsky, Nicole Kuznetsov, Samantha Hong, Jessica Martin, Mikayla James, Mary B Makarious, Hampton Leonard, Hirotaka Iwaki, Faraz Faghri, Cornelis Blauwendraat, Andrew B Singleton, Yeajin Song, Kristin Levine, Ashwin Ashok Kumar Sreelatha, Zih-Hua Fang, Mike Nalls
{"title":"GenoTools:用于高效基因型数据质量控制和分析的开源 Python 软件包。","authors":"Dan Vitale, Mathew J Koretsky, Nicole Kuznetsov, Samantha Hong, Jessica Martin, Mikayla James, Mary B Makarious, Hampton Leonard, Hirotaka Iwaki, Faraz Faghri, Cornelis Blauwendraat, Andrew B Singleton, Yeajin Song, Kristin Levine, Ashwin Ashok Kumar Sreelatha, Zih-Hua Fang, Mike Nalls","doi":"10.1093/g3journal/jkae268","DOIUrl":null,"url":null,"abstract":"<p><p>GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control (QC), and genome-wide association studies (GWAS) capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' \"Ancestry\" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization specified to the user's genotyping or sequencing platform. As the genotype processing engine that powers several large initiatives, including the NIH's Center for Alzheimer's and Related Dementias (CARD) and the Global Parkinson's Genetics Program (GP2), GenoTools was used to process and analyze the UK Biobank and major Alzheimer's Disease (AD) and Parkinson's Disease (PD) datasets with over 400,000 genotypes from arrays and 5,000 whole genome sequencing (WGS) samples and has led to novel discoveries in diverse populations. It has provided replicable ancestry predictions, implemented rigorous QC, and conducted genetic ancestry-specific GWAS to identify systematic errors or biases through a single command. GenoTools is a customizable tool that enables users to efficiently analyze and scale genotyping and sequencing (WGS and exome) data with reproducible and scalable ancestry, QC, and GWAS pipelines.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GenoTools: An Open-Source Python Package for Efficient Genotype Data Quality Control and Analysis.\",\"authors\":\"Dan Vitale, Mathew J Koretsky, Nicole Kuznetsov, Samantha Hong, Jessica Martin, Mikayla James, Mary B Makarious, Hampton Leonard, Hirotaka Iwaki, Faraz Faghri, Cornelis Blauwendraat, Andrew B Singleton, Yeajin Song, Kristin Levine, Ashwin Ashok Kumar Sreelatha, Zih-Hua Fang, Mike Nalls\",\"doi\":\"10.1093/g3journal/jkae268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control (QC), and genome-wide association studies (GWAS) capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' \\\"Ancestry\\\" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization specified to the user's genotyping or sequencing platform. As the genotype processing engine that powers several large initiatives, including the NIH's Center for Alzheimer's and Related Dementias (CARD) and the Global Parkinson's Genetics Program (GP2), GenoTools was used to process and analyze the UK Biobank and major Alzheimer's Disease (AD) and Parkinson's Disease (PD) datasets with over 400,000 genotypes from arrays and 5,000 whole genome sequencing (WGS) samples and has led to novel discoveries in diverse populations. It has provided replicable ancestry predictions, implemented rigorous QC, and conducted genetic ancestry-specific GWAS to identify systematic errors or biases through a single command. GenoTools is a customizable tool that enables users to efficiently analyze and scale genotyping and sequencing (WGS and exome) data with reproducible and scalable ancestry, QC, and GWAS pipelines.</p>\",\"PeriodicalId\":12468,\"journal\":{\"name\":\"G3: Genes|Genomes|Genetics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"G3: Genes|Genomes|Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/g3journal/jkae268\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"G3: Genes|Genomes|Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/g3journal/jkae268","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

GenoTools 是一个 Python 软件包,它将祖先估计、质量控制 (QC) 和全基因组关联研究 (GWAS) 功能集成到高效的管道中,从而简化了群体遗传学研究。通过在完全可定制的管道中跟踪样本、变异和特定质量度量,用户可以轻松管理大型和小型研究的遗传学数据。GenoTools 的 "祖先 "模块可提供高度准确的预测,实现高质量的祖先特定研究,并可根据用户的基因分型或测序平台进行定制祖先模型训练和序列化。GenoTools 作为基因型处理引擎,为包括美国国立卫生研究院阿尔茨海默氏症及相关痴呆症中心(CARD)和全球帕金森氏症遗传学计划(GP2)在内的多项大型计划提供支持,被用于处理和分析英国生物库以及主要的阿尔茨海默氏症(AD)和帕金森氏症(PD)数据集,这些数据集包含来自阵列的 40 多万个基因型和 5000 个全基因组测序(WGS)样本,并在不同人群中带来了新发现。它提供了可复制的祖先预测,实施了严格的质量控制,并通过单个命令进行了遗传祖先特异性 GWAS,以识别系统错误或偏差。GenoTools 是一款可定制的工具,用户可利用可重复和可扩展的祖先、质量控制和 GWAS 管道,高效地分析和扩展基因分型和测序(WGS 和外显子组)数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GenoTools: An Open-Source Python Package for Efficient Genotype Data Quality Control and Analysis.

GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control (QC), and genome-wide association studies (GWAS) capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' "Ancestry" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization specified to the user's genotyping or sequencing platform. As the genotype processing engine that powers several large initiatives, including the NIH's Center for Alzheimer's and Related Dementias (CARD) and the Global Parkinson's Genetics Program (GP2), GenoTools was used to process and analyze the UK Biobank and major Alzheimer's Disease (AD) and Parkinson's Disease (PD) datasets with over 400,000 genotypes from arrays and 5,000 whole genome sequencing (WGS) samples and has led to novel discoveries in diverse populations. It has provided replicable ancestry predictions, implemented rigorous QC, and conducted genetic ancestry-specific GWAS to identify systematic errors or biases through a single command. GenoTools is a customizable tool that enables users to efficiently analyze and scale genotyping and sequencing (WGS and exome) data with reproducible and scalable ancestry, QC, and GWAS pipelines.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
G3: Genes|Genomes|Genetics
G3: Genes|Genomes|Genetics GENETICS & HEREDITY-
CiteScore
5.10
自引率
3.80%
发文量
305
审稿时长
3-8 weeks
期刊介绍: G3: Genes, Genomes, Genetics provides a forum for the publication of high‐quality foundational research, particularly research that generates useful genetic and genomic information such as genome maps, single gene studies, genome‐wide association and QTL studies, as well as genome reports, mutant screens, and advances in methods and technology. The Editorial Board of G3 believes that rapid dissemination of these data is the necessary foundation for analysis that leads to mechanistic insights. G3, published by the Genetics Society of America, meets the critical and growing need of the genetics community for rapid review and publication of important results in all areas of genetics. G3 offers the opportunity to publish the puzzling finding or to present unpublished results that may not have been submitted for review and publication due to a perceived lack of a potential high-impact finding. G3 has earned the DOAJ Seal, which is a mark of certification for open access journals, awarded by DOAJ to journals that achieve a high level of openness, adhere to Best Practice and high publishing standards.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信