SV-JIM,详细的两两结构变异调用使用长读和基因组组装。

IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Clarence Todd , Lingling Jin , Ian McQuillan
{"title":"SV-JIM,详细的两两结构变异调用使用长读和基因组组装。","authors":"Clarence Todd ,&nbsp;Lingling Jin ,&nbsp;Ian McQuillan","doi":"10.1016/j.ymeth.2024.12.015","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes a detailed process for SV calling that permits a data-driven assessment of multiple SV callers that uses both genome assemblies and long-reads. The process is implemented as a software pipeline named Structural Variant − Jaccard Index Measure, or SVJIM, using the Snakemake <span><span>[20]</span></span> workflow management system. Like most state-of-the-art SV callers, SV-JIM detects the presence of variations between pairs of genomes, but it streamlines the numerous SV calling stages into a single process for user convenience and evaluates the multiple SV sets produced using the Jaccard index measure to identify those with the highest consistency among the included SV callers. SV-JIM then produces aggregated SV results based on how many callers supported the reported SVs. For validation, SV-JIM was assessed through three case studies on the Homo sapiens genome and two plant genomes – Brassica nigra and Arabidopsis thaliana. Executing SV-JIM identified a significant amount of inter-caller variance which varied by tens of thousands of results on the larger Brassica nigra and Homo sapiens genomes. Further, aggregating the SV sets helped simplify better retention of the less frequently occurring SV types by requiring a level of minimum support rather than from a specific SV caller combination. Finally, these case studies identified a potential for inflated precision reporting that can occur during evaluation. SV-JIM is available publicly under MIT license at <span><span>https://github.com/USask-BINFO/SV-JIM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"234 ","pages":"Pages 305-313"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SV-JIM, detailed pairwise structural variant calling using long-reads and genome assemblies\",\"authors\":\"Clarence Todd ,&nbsp;Lingling Jin ,&nbsp;Ian McQuillan\",\"doi\":\"10.1016/j.ymeth.2024.12.015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper proposes a detailed process for SV calling that permits a data-driven assessment of multiple SV callers that uses both genome assemblies and long-reads. The process is implemented as a software pipeline named Structural Variant − Jaccard Index Measure, or SVJIM, using the Snakemake <span><span>[20]</span></span> workflow management system. Like most state-of-the-art SV callers, SV-JIM detects the presence of variations between pairs of genomes, but it streamlines the numerous SV calling stages into a single process for user convenience and evaluates the multiple SV sets produced using the Jaccard index measure to identify those with the highest consistency among the included SV callers. SV-JIM then produces aggregated SV results based on how many callers supported the reported SVs. For validation, SV-JIM was assessed through three case studies on the Homo sapiens genome and two plant genomes – Brassica nigra and Arabidopsis thaliana. Executing SV-JIM identified a significant amount of inter-caller variance which varied by tens of thousands of results on the larger Brassica nigra and Homo sapiens genomes. Further, aggregating the SV sets helped simplify better retention of the less frequently occurring SV types by requiring a level of minimum support rather than from a specific SV caller combination. Finally, these case studies identified a potential for inflated precision reporting that can occur during evaluation. SV-JIM is available publicly under MIT license at <span><span>https://github.com/USask-BINFO/SV-JIM</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\"234 \",\"pages\":\"Pages 305-313\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S104620232500009X\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S104620232500009X","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一个详细的SV调用过程,该过程允许使用基因组组装和长读段对多个SV调用者进行数据驱动评估。该过程使用Snakemake[20]工作流管理系统,作为一个名为结构变体- Jaccard索引度量(SVJIM)的软件管道来实现。像大多数最先进的SV呼叫者一样,SV- jim检测基因组对之间的差异,但为了方便用户,它将众多SV调用阶段简化为一个过程,并使用Jaccard指数测量评估产生的多个SV集,以识别在所包括的SV呼叫者中一致性最高的那些。然后,SV- jim根据支持报告的SV的调用者的数量生成聚合的SV结果。为了验证SV-JIM的有效性,我们对三个智人基因组和两个植物基因组(芸苔和拟南芥)进行了案例研究。执行SV-JIM识别出大量的呼叫者之间的差异,这些差异在较大的芸芥和智人基因组上有成千上万的结果。此外,通过要求一定程度的最小支持,而不是特定的SV调用者组合,聚合SV集有助于更好地简化对出现频率较低的SV类型的保留。最后,这些案例研究确定了在评估期间可能发生的夸大精度报告的可能性。SV-JIM在MIT许可下可在https://github.com/USask-BINFO/SV-JIM上公开获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SV-JIM, detailed pairwise structural variant calling using long-reads and genome assemblies
This paper proposes a detailed process for SV calling that permits a data-driven assessment of multiple SV callers that uses both genome assemblies and long-reads. The process is implemented as a software pipeline named Structural Variant − Jaccard Index Measure, or SVJIM, using the Snakemake [20] workflow management system. Like most state-of-the-art SV callers, SV-JIM detects the presence of variations between pairs of genomes, but it streamlines the numerous SV calling stages into a single process for user convenience and evaluates the multiple SV sets produced using the Jaccard index measure to identify those with the highest consistency among the included SV callers. SV-JIM then produces aggregated SV results based on how many callers supported the reported SVs. For validation, SV-JIM was assessed through three case studies on the Homo sapiens genome and two plant genomes – Brassica nigra and Arabidopsis thaliana. Executing SV-JIM identified a significant amount of inter-caller variance which varied by tens of thousands of results on the larger Brassica nigra and Homo sapiens genomes. Further, aggregating the SV sets helped simplify better retention of the less frequently occurring SV types by requiring a level of minimum support rather than from a specific SV caller combination. Finally, these case studies identified a potential for inflated precision reporting that can occur during evaluation. SV-JIM is available publicly under MIT license at https://github.com/USask-BINFO/SV-JIM.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Methods
Methods 生物-生化研究方法
CiteScore
9.80
自引率
2.10%
发文量
222
审稿时长
11.3 weeks
期刊介绍: Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信