M.N. van Baardwijk , L.S.E.M. Heijnen , H. Zhao , M. Baudis , A.P. Stubbs
{"title":"高密度 SNP 基因分型阵列拷贝数变异检测工具的系统基准。","authors":"M.N. van Baardwijk , L.S.E.M. Heijnen , H. Zhao , M. Baudis , A.P. Stubbs","doi":"10.1016/j.ygeno.2024.110962","DOIUrl":null,"url":null,"abstract":"<div><div>Copy Number Variations (CNVs) are crucial in various diseases, especially cancer, but detecting them accurately from SNP genotyping arrays remains challenging. Therefore, this study benchmarked five CNV detection tools—PennCNV, QuantiSNP, iPattern, EnsembleCNV, and R-GADA—using SNP array and WGS data from 2002 individuals of the DRAGEN re-analysis of the 1000 Genomes project. Results showed significant variability in tool performance. R-GADA had the highest recall but low precision, while PennCNV was the most reliable in terms of precision and F1 score. EnsembleCNV improved recall by combining multiple callers but increased false positives. Overall, current tools, including new methods, do not outperform PennCNV in precise CNV detection. Improved reference data and consensus on true positive CNV calls are necessary. This study provides valuable insights and scalable workflows for researchers selecting CNV detection methods in future studies.</div></div>","PeriodicalId":12521,"journal":{"name":"Genomics","volume":"116 6","pages":"Article 110962"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A systematic benchmark of copy number variation detection tools for high density SNP genotyping arrays\",\"authors\":\"M.N. van Baardwijk , L.S.E.M. Heijnen , H. Zhao , M. Baudis , A.P. Stubbs\",\"doi\":\"10.1016/j.ygeno.2024.110962\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Copy Number Variations (CNVs) are crucial in various diseases, especially cancer, but detecting them accurately from SNP genotyping arrays remains challenging. Therefore, this study benchmarked five CNV detection tools—PennCNV, QuantiSNP, iPattern, EnsembleCNV, and R-GADA—using SNP array and WGS data from 2002 individuals of the DRAGEN re-analysis of the 1000 Genomes project. Results showed significant variability in tool performance. R-GADA had the highest recall but low precision, while PennCNV was the most reliable in terms of precision and F1 score. EnsembleCNV improved recall by combining multiple callers but increased false positives. Overall, current tools, including new methods, do not outperform PennCNV in precise CNV detection. Improved reference data and consensus on true positive CNV calls are necessary. This study provides valuable insights and scalable workflows for researchers selecting CNV detection methods in future studies.</div></div>\",\"PeriodicalId\":12521,\"journal\":{\"name\":\"Genomics\",\"volume\":\"116 6\",\"pages\":\"Article 110962\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0888754324001836\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888754324001836","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
A systematic benchmark of copy number variation detection tools for high density SNP genotyping arrays
Copy Number Variations (CNVs) are crucial in various diseases, especially cancer, but detecting them accurately from SNP genotyping arrays remains challenging. Therefore, this study benchmarked five CNV detection tools—PennCNV, QuantiSNP, iPattern, EnsembleCNV, and R-GADA—using SNP array and WGS data from 2002 individuals of the DRAGEN re-analysis of the 1000 Genomes project. Results showed significant variability in tool performance. R-GADA had the highest recall but low precision, while PennCNV was the most reliable in terms of precision and F1 score. EnsembleCNV improved recall by combining multiple callers but increased false positives. Overall, current tools, including new methods, do not outperform PennCNV in precise CNV detection. Improved reference data and consensus on true positive CNV calls are necessary. This study provides valuable insights and scalable workflows for researchers selecting CNV detection methods in future studies.
期刊介绍:
Genomics is a forum for describing the development of genome-scale technologies and their application to all areas of biological investigation.
As a journal that has evolved with the field that carries its name, Genomics focuses on the development and application of cutting-edge methods, addressing fundamental questions with potential interest to a wide audience. Our aim is to publish the highest quality research and to provide authors with rapid, fair and accurate review and publication of manuscripts falling within our scope.