A practical guide to identifying associations between tandem repeats and complex human traits using consensus genotypes from multiple tools.

IF 16 1区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Ibra Lujumba, Yagoub Adam, Helyaneh Ziaei Jam, Itunuoluwa Isewon, Nomakhosazana Monnakgotla, Yang Li, Blessing Onyido, Kakembo Fredrick, Faith Adegoke, Jerry Emmanuel, Jumoke Adeyemi, Olajumoke Ibitoye, Samuel Owusu-Ansah, Matthew Boladele Akanle, Habi Joseph, Mike Nsubuga, Ronald Galiwango, Martin Okitwi, Namuswe Magdalene, Odur Walter, Zama Mngadi, Marion Adebiyi, Jelili Oyelade, Melissa Nel, Daudi Jjingo, Melissa Gymrek, Ezekiel Adebiyi
{"title":"A practical guide to identifying associations between tandem repeats and complex human traits using consensus genotypes from multiple tools.","authors":"Ibra Lujumba, Yagoub Adam, Helyaneh Ziaei Jam, Itunuoluwa Isewon, Nomakhosazana Monnakgotla, Yang Li, Blessing Onyido, Kakembo Fredrick, Faith Adegoke, Jerry Emmanuel, Jumoke Adeyemi, Olajumoke Ibitoye, Samuel Owusu-Ansah, Matthew Boladele Akanle, Habi Joseph, Mike Nsubuga, Ronald Galiwango, Martin Okitwi, Namuswe Magdalene, Odur Walter, Zama Mngadi, Marion Adebiyi, Jelili Oyelade, Melissa Nel, Daudi Jjingo, Melissa Gymrek, Ezekiel Adebiyi","doi":"10.1038/s41596-025-01231-y","DOIUrl":null,"url":null,"abstract":"<p><p>Tandem repeats (TRs) are highly variable loci in the human genome that are linked to various human phenotypes. Accurate and reliable genotyping of TRs is important in understanding population TR variation dynamics and their effects in TR-trait association studies. In this protocol, we describe how to generate high-quality consensus TR genotypes for population genomics studies. In particular, we detail steps to: (i) perform TR genotyping from short-read whole-genome sequencing data by using the HipSTR, GangSTR, adVNTR and ExpansionHunter tools, (ii) perform quality control checks on TR genotypes by using TRTools and (iii) integrate TR genotypes from different tools by using EnsembleTR. We further discuss how to visualize and investigate TR variation patterns to identify population-specific expansions and perform TR-trait association analyses. We demonstrate the utility of these steps by analyzing a small dataset from the 1000 Genomes Project. In addition, we recapitulate a previously identified association between TR length and gene expression in the African population and provide a generalized discussion on TR analysis and its relevance to identifying complex traits. The expected time for installing the necessary software for each section is ~10 min. The expected run time on the user's desired dataset can vary from hours to days depending on factors such as the size of the data, input parameters and the capacity of the computing infrastructure.</p>","PeriodicalId":18901,"journal":{"name":"Nature Protocols","volume":" ","pages":""},"PeriodicalIF":16.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Protocols","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41596-025-01231-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Tandem repeats (TRs) are highly variable loci in the human genome that are linked to various human phenotypes. Accurate and reliable genotyping of TRs is important in understanding population TR variation dynamics and their effects in TR-trait association studies. In this protocol, we describe how to generate high-quality consensus TR genotypes for population genomics studies. In particular, we detail steps to: (i) perform TR genotyping from short-read whole-genome sequencing data by using the HipSTR, GangSTR, adVNTR and ExpansionHunter tools, (ii) perform quality control checks on TR genotypes by using TRTools and (iii) integrate TR genotypes from different tools by using EnsembleTR. We further discuss how to visualize and investigate TR variation patterns to identify population-specific expansions and perform TR-trait association analyses. We demonstrate the utility of these steps by analyzing a small dataset from the 1000 Genomes Project. In addition, we recapitulate a previously identified association between TR length and gene expression in the African population and provide a generalized discussion on TR analysis and its relevance to identifying complex traits. The expected time for installing the necessary software for each section is ~10 min. The expected run time on the user's desired dataset can vary from hours to days depending on factors such as the size of the data, input parameters and the capacity of the computing infrastructure.

一个实用的指南,以确定串联重复和复杂的人类性状之间的关联使用共识基因型从多个工具。
串联重复序列(TRs)是人类基因组中高度可变的位点,与各种人类表型相关。准确可靠的TR基因分型对了解群体TR变异动态及其在TR性状关联研究中的作用具有重要意义。在本协议中,我们描述了如何为群体基因组学研究产生高质量的共识TR基因型。具体来说,我们详细介绍了以下步骤:(i)使用HipSTR、GangSTR、adVNTR和ExpansionHunter工具从短读全基因组测序数据中进行TR基因分型,(ii)使用TRTools对TR基因型进行质量控制检查,(iii)使用EnsembleTR整合来自不同工具的TR基因型。我们进一步讨论了如何可视化和调查TR变异模式,以确定群体特异性扩展和执行TR-性状关联分析。我们通过分析来自1000个基因组计划的一个小数据集来演示这些步骤的实用性。此外,我们概述了之前在非洲人群中发现的TR长度与基因表达之间的关联,并对TR分析及其与识别复杂性状的相关性进行了一般性讨论。每个部分所需软件的预计安装时间为~10分钟。根据数据大小、输入参数和计算基础设施的容量等因素,用户所需数据集的预期运行时间可能从数小时到数天不等。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Protocols
Nature Protocols 生物-生化研究方法
CiteScore
29.10
自引率
0.70%
发文量
128
审稿时长
4 months
期刊介绍: Nature Protocols focuses on publishing protocols used to address significant biological and biomedical science research questions, including methods grounded in physics and chemistry with practical applications to biological problems. The journal caters to a primary audience of research scientists and, as such, exclusively publishes protocols with research applications. Protocols primarily aimed at influencing patient management and treatment decisions are not featured. The specific techniques covered encompass a wide range, including but not limited to: Biochemistry, Cell biology, Cell culture, Chemical modification, Computational biology, Developmental biology, Epigenomics, Genetic analysis, Genetic modification, Genomics, Imaging, Immunology, Isolation, purification, and separation, Lipidomics, Metabolomics, Microbiology, Model organisms, Nanotechnology, Neuroscience, Nucleic-acid-based molecular biology, Pharmacology, Plant biology, Protein analysis, Proteomics, Spectroscopy, Structural biology, Synthetic chemistry, Tissue culture, Toxicology, and Virology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信