A Protocol to Extract a Specific Genomic Region from a Public Whole-Genome Database and Modify Analytical Bin Length for Population Genetic Studies.

IF 2.3 Q3 BIOCHEMICAL RESEARCH METHODS
Muhammad Shoaib Akhtar, Shoji Kawamura
{"title":"A Protocol to Extract a Specific Genomic Region from a Public Whole-Genome Database and Modify Analytical Bin Length for Population Genetic Studies.","authors":"Muhammad Shoaib Akhtar, Shoji Kawamura","doi":"10.3390/mps7040057","DOIUrl":null,"url":null,"abstract":"<p><p>With the advent of \"next-generation\" sequencing and the continuous reduction in sequencing costs, an increasing amount of genomic data has emerged, such as whole-genome, whole-exome, and targeted sequencing data. These applications are popular not only in mega sequencing projects, such as the 1000 Genomes Project and UK BioBank, but also among individual researchers. Evolutionary genetic analyses, such as the dN/dS ratio and Tajima's <i>D</i>, are demanded more and more for whole-genome-level population data. These analyses are often carried out under a uniform custom bin size across the genome. However, these analyses require subdivision of a genomic region into functional units, such as protein-coding regions, introns, and untranslated regions, and computing these genetic measures for large-scale data remains challenging. In a recent investigation, we successfully devised a method to address this issue. This method requires a multi-sample VCF file containing population data, a reference genome, target regions in the BED file, and a list of samples to be included in the analysis. Given that the targeted regions are extracted in a new VCF file, targeted population genetic analysis can be performed. We conducted Tajima's D analysis using this approach on intact and pseudogenes, as well as non-coding regions.</p>","PeriodicalId":18715,"journal":{"name":"Methods and Protocols","volume":"7 4","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11357298/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods and Protocols","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/mps7040057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

With the advent of "next-generation" sequencing and the continuous reduction in sequencing costs, an increasing amount of genomic data has emerged, such as whole-genome, whole-exome, and targeted sequencing data. These applications are popular not only in mega sequencing projects, such as the 1000 Genomes Project and UK BioBank, but also among individual researchers. Evolutionary genetic analyses, such as the dN/dS ratio and Tajima's D, are demanded more and more for whole-genome-level population data. These analyses are often carried out under a uniform custom bin size across the genome. However, these analyses require subdivision of a genomic region into functional units, such as protein-coding regions, introns, and untranslated regions, and computing these genetic measures for large-scale data remains challenging. In a recent investigation, we successfully devised a method to address this issue. This method requires a multi-sample VCF file containing population data, a reference genome, target regions in the BED file, and a list of samples to be included in the analysis. Given that the targeted regions are extracted in a new VCF file, targeted population genetic analysis can be performed. We conducted Tajima's D analysis using this approach on intact and pseudogenes, as well as non-coding regions.

从公共全基因组数据库中提取特定基因组区域并为群体遗传研究修改分析区间长度的规程。
随着 "下一代 "测序技术的出现和测序成本的不断降低,出现了越来越多的基因组数据,如全基因组、全外显子组和靶向测序数据。这些应用不仅在千人基因组计划和英国生物库等大型测序项目中很受欢迎,在个人研究人员中也很流行。对于全基因组水平的群体数据,人们越来越需要进行进化遗传分析,如 dN/dS 比值和田岛 D。这些分析通常是在全基因组统一的自定义粒度下进行的。然而,这些分析需要将基因组区域细分为功能单元,如蛋白质编码区、内含子和非翻译区,因此为大规模数据计算这些遗传指标仍然具有挑战性。在最近的一项研究中,我们成功地设计出一种方法来解决这一问题。这种方法需要一个包含群体数据、参考基因组、BED 文件中的目标区域和分析中要包含的样本列表的多样本 VCF 文件。在新的 VCF 文件中提取目标区域后,就可以进行目标种群遗传分析了。我们使用这种方法对完整基因、假基因以及非编码区进行了田岛 D 分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Methods and Protocols
Methods and Protocols Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (miscellaneous)
CiteScore
3.60
自引率
0.00%
发文量
85
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信