Julia M Sealock, Franjo Ivankovic, Calwing Liao, Siwei Chen, Claire Churchhouse, Konrad J Karczewski, Daniel P Howrigan, Benjamin M Neale
{"title":"Tutorial: guidelines for quality filtering of whole-exome and whole-genome sequencing data for population-scale association analyses.","authors":"Julia M Sealock, Franjo Ivankovic, Calwing Liao, Siwei Chen, Claire Churchhouse, Konrad J Karczewski, Daniel P Howrigan, Benjamin M Neale","doi":"10.1038/s41596-025-01169-1","DOIUrl":null,"url":null,"abstract":"<p><p>Genetic sequencing technologies are powerful tools for identifying rare variants and genes associated with Mendelian and complex traits; indeed, whole-exome and whole-genome sequencing are increasingly popular methods for population-scale genetic studies. However, careful quality control steps should be taken to ensure study accuracy and reproducibility, and sequencing data require extensive quality filtering to delineate true variants from technical artifacts. Although processing standards are harmonized across pipelines to call variants from sequencing reads, there currently exists no standardized pipeline for conducting quality filtering on variant-level datasets for the purpose of population-scale association analysis. In this Tutorial, we discuss key quality control parameters, provide guidelines for conducting quality filtering of samples and variants, and compare commonly used software programs for quality control of samples, variants and genotypes from sequencing data. As sequencing data continue to gain popularity in genetic research, establishing standardized quality control practices is crucial to ensure consistent, reliable and reproducible results across studies.</p>","PeriodicalId":18901,"journal":{"name":"Nature Protocols","volume":" ","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Protocols","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41596-025-01169-1","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Genetic sequencing technologies are powerful tools for identifying rare variants and genes associated with Mendelian and complex traits; indeed, whole-exome and whole-genome sequencing are increasingly popular methods for population-scale genetic studies. However, careful quality control steps should be taken to ensure study accuracy and reproducibility, and sequencing data require extensive quality filtering to delineate true variants from technical artifacts. Although processing standards are harmonized across pipelines to call variants from sequencing reads, there currently exists no standardized pipeline for conducting quality filtering on variant-level datasets for the purpose of population-scale association analysis. In this Tutorial, we discuss key quality control parameters, provide guidelines for conducting quality filtering of samples and variants, and compare commonly used software programs for quality control of samples, variants and genotypes from sequencing data. As sequencing data continue to gain popularity in genetic research, establishing standardized quality control practices is crucial to ensure consistent, reliable and reproducible results across studies.
期刊介绍:
Nature Protocols focuses on publishing protocols used to address significant biological and biomedical science research questions, including methods grounded in physics and chemistry with practical applications to biological problems. The journal caters to a primary audience of research scientists and, as such, exclusively publishes protocols with research applications. Protocols primarily aimed at influencing patient management and treatment decisions are not featured.
The specific techniques covered encompass a wide range, including but not limited to: Biochemistry, Cell biology, Cell culture, Chemical modification, Computational biology, Developmental biology, Epigenomics, Genetic analysis, Genetic modification, Genomics, Imaging, Immunology, Isolation, purification, and separation, Lipidomics, Metabolomics, Microbiology, Model organisms, Nanotechnology, Neuroscience, Nucleic-acid-based molecular biology, Pharmacology, Plant biology, Protein analysis, Proteomics, Spectroscopy, Structural biology, Synthetic chemistry, Tissue culture, Toxicology, and Virology.