Ibra Lujumba, Yagoub Adam, Helyaneh Ziaei Jam, Itunuoluwa Isewon, Nomakhosazana Monnakgotla, Yang Li, Blessing Onyido, Kakembo Fredrick, Faith Adegoke, Jerry Emmanuel, Jumoke Adeyemi, Olajumoke Ibitoye, Samuel Owusu-Ansah, Matthew Boladele Akanle, Habi Joseph, Mike Nsubuga, Ronald Galiwango, Martin Okitwi, Namuswe Magdalene, Odur Walter, Zama Mngadi, Marion Adebiyi, Jelili Oyelade, Melissa Nel, Daudi Jjingo, Melissa Gymrek, Ezekiel Adebiyi
{"title":"A practical guide to identifying associations between tandem repeats and complex human traits using consensus genotypes from multiple tools.","authors":"Ibra Lujumba, Yagoub Adam, Helyaneh Ziaei Jam, Itunuoluwa Isewon, Nomakhosazana Monnakgotla, Yang Li, Blessing Onyido, Kakembo Fredrick, Faith Adegoke, Jerry Emmanuel, Jumoke Adeyemi, Olajumoke Ibitoye, Samuel Owusu-Ansah, Matthew Boladele Akanle, Habi Joseph, Mike Nsubuga, Ronald Galiwango, Martin Okitwi, Namuswe Magdalene, Odur Walter, Zama Mngadi, Marion Adebiyi, Jelili Oyelade, Melissa Nel, Daudi Jjingo, Melissa Gymrek, Ezekiel Adebiyi","doi":"10.1038/s41596-025-01231-y","DOIUrl":null,"url":null,"abstract":"<p><p>Tandem repeats (TRs) are highly variable loci in the human genome that are linked to various human phenotypes. Accurate and reliable genotyping of TRs is important in understanding population TR variation dynamics and their effects in TR-trait association studies. In this protocol, we describe how to generate high-quality consensus TR genotypes for population genomics studies. In particular, we detail steps to: (i) perform TR genotyping from short-read whole-genome sequencing data by using the HipSTR, GangSTR, adVNTR and ExpansionHunter tools, (ii) perform quality control checks on TR genotypes by using TRTools and (iii) integrate TR genotypes from different tools by using EnsembleTR. We further discuss how to visualize and investigate TR variation patterns to identify population-specific expansions and perform TR-trait association analyses. We demonstrate the utility of these steps by analyzing a small dataset from the 1000 Genomes Project. In addition, we recapitulate a previously identified association between TR length and gene expression in the African population and provide a generalized discussion on TR analysis and its relevance to identifying complex traits. The expected time for installing the necessary software for each section is ~10 min. The expected run time on the user's desired dataset can vary from hours to days depending on factors such as the size of the data, input parameters and the capacity of the computing infrastructure.</p>","PeriodicalId":18901,"journal":{"name":"Nature Protocols","volume":" ","pages":""},"PeriodicalIF":16.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Protocols","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41596-025-01231-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Tandem repeats (TRs) are highly variable loci in the human genome that are linked to various human phenotypes. Accurate and reliable genotyping of TRs is important in understanding population TR variation dynamics and their effects in TR-trait association studies. In this protocol, we describe how to generate high-quality consensus TR genotypes for population genomics studies. In particular, we detail steps to: (i) perform TR genotyping from short-read whole-genome sequencing data by using the HipSTR, GangSTR, adVNTR and ExpansionHunter tools, (ii) perform quality control checks on TR genotypes by using TRTools and (iii) integrate TR genotypes from different tools by using EnsembleTR. We further discuss how to visualize and investigate TR variation patterns to identify population-specific expansions and perform TR-trait association analyses. We demonstrate the utility of these steps by analyzing a small dataset from the 1000 Genomes Project. In addition, we recapitulate a previously identified association between TR length and gene expression in the African population and provide a generalized discussion on TR analysis and its relevance to identifying complex traits. The expected time for installing the necessary software for each section is ~10 min. The expected run time on the user's desired dataset can vary from hours to days depending on factors such as the size of the data, input parameters and the capacity of the computing infrastructure.
期刊介绍:
Nature Protocols focuses on publishing protocols used to address significant biological and biomedical science research questions, including methods grounded in physics and chemistry with practical applications to biological problems. The journal caters to a primary audience of research scientists and, as such, exclusively publishes protocols with research applications. Protocols primarily aimed at influencing patient management and treatment decisions are not featured.
The specific techniques covered encompass a wide range, including but not limited to: Biochemistry, Cell biology, Cell culture, Chemical modification, Computational biology, Developmental biology, Epigenomics, Genetic analysis, Genetic modification, Genomics, Imaging, Immunology, Isolation, purification, and separation, Lipidomics, Metabolomics, Microbiology, Model organisms, Nanotechnology, Neuroscience, Nucleic-acid-based molecular biology, Pharmacology, Plant biology, Protein analysis, Proteomics, Spectroscopy, Structural biology, Synthetic chemistry, Tissue culture, Toxicology, and Virology.