Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data

IF 1 Q4 GENETICS & HEREDITY
Shunichi Kosugi, Chikashi Terao
{"title":"Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data","authors":"Shunichi Kosugi, Chikashi Terao","doi":"10.1038/s41439-024-00276-x","DOIUrl":null,"url":null,"abstract":"<p>Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.</p>","PeriodicalId":36861,"journal":{"name":"Human Genome Variation","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genome Variation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s41439-024-00276-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.

Abstract Image

通过短长线程测序数据检测到的 SNV、嵌合体和结构变异的比较评估
短线程和长线程测序技术通常用于检测DNA变异,包括SNVs、indels和结构变异(SVs)。然而,人们对短线程和长线程数据在检测变异的质量和数量上的差异还不完全了解。在本研究中,我们采用一种结合人工目测的新型评估框架,全面评估了基于短读数和长读数的 SNV、indel 和 SV 检测算法(6 种检测 SNV,12 种检测 indel,13 种检测 SV)的变异调用性能。结果表明,与基于长读数的算法相比,基于短读数的检测算法对大于 10 bp 的 indel 插入调用的检测能力较差;但是,SNV 和 indel 缺失检测的召回率和精确度在短读数和长读数数据中相似。在重复区域,尤其是中小型 SV 的检测中,基于短读数算法的 SV 检测召回率明显低于基于长读数算法的 SV 检测召回率。相比之下,短读取数据和长读取数据在非重复区域 SV 检测的召回率和精确度相似。这些研究结果表明有必要改进策略,例如结合多种变异检测算法,利用短读数数据生成更完整的变异集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Human Genome Variation
Human Genome Variation Biochemistry, Genetics and Molecular Biology-Genetics
CiteScore
2.30
自引率
0.00%
发文量
39
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信