黑鸟:结构变异检测使用合成和低覆盖长读取。

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-07-04 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf151
Dmitry Meleshko, Rui Yang, Salil Maharjan, David C Danko, Anton Korobeynikov, Iman Hajirasouliha
{"title":"黑鸟:结构变异检测使用合成和低覆盖长读取。","authors":"Dmitry Meleshko, Rui Yang, Salil Maharjan, David C Danko, Anton Korobeynikov, Iman Hajirasouliha","doi":"10.1093/bioadv/vbaf151","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Recent benchmarks show that most structural variations, especially within 50-10,000 bp range cannot be resolved with short-read sequencing, but long-read structural variant callers perform better on the same datasets. However, high-coverage long-read sequencing is costly and requires substantial input DNA. Reducing coverage lowers cost but significantly impacts the performance of existing structural variation (SV) callers. Synthetic long-read technologies offer long-range information at lower cost, but leveraging them for SVs under 50 kbp remains challenging.</p><p><strong>Results: </strong>We propose a novel hybrid alignment- and local-assembly-based algorithm, Blackbird, that uses synthetic long reads and low-coverage long reads to improve structural variant detection. Instead of relying on whole-genome assembly, Blackbird uses a sliding window approach and synthetic long-read barcode information to assemble local segments, integrating long reads to improve structural variant detection accuracy. We evaluated Blackbird on real human genome datasets. On the HG002 Genome in a Bottle (GIAB) benchmark, Blackbird in hybrid mode demonstrated results comparable to state-of-the-art long-read tools, while using less long-read coverage. Blackbird requires only 5 <math><mo>×</mo></math> coverage to achieve F1-scores (0.835 and 0.808 for deletions and insertions) similar to PBSV and Sniffles2 using 10 <math><mo>×</mo></math> PacBio Hi-Fi long-read coverage.</p><p><strong>Availability and implementation: </strong>Blackbird is available at https://github.com/1dayac/Blackbird.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf151"},"PeriodicalIF":2.8000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12237510/pdf/","citationCount":"0","resultStr":"{\"title\":\"Blackbird: structural variant detection using synthetic and low-coverage long-reads.\",\"authors\":\"Dmitry Meleshko, Rui Yang, Salil Maharjan, David C Danko, Anton Korobeynikov, Iman Hajirasouliha\",\"doi\":\"10.1093/bioadv/vbaf151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Recent benchmarks show that most structural variations, especially within 50-10,000 bp range cannot be resolved with short-read sequencing, but long-read structural variant callers perform better on the same datasets. However, high-coverage long-read sequencing is costly and requires substantial input DNA. Reducing coverage lowers cost but significantly impacts the performance of existing structural variation (SV) callers. Synthetic long-read technologies offer long-range information at lower cost, but leveraging them for SVs under 50 kbp remains challenging.</p><p><strong>Results: </strong>We propose a novel hybrid alignment- and local-assembly-based algorithm, Blackbird, that uses synthetic long reads and low-coverage long reads to improve structural variant detection. Instead of relying on whole-genome assembly, Blackbird uses a sliding window approach and synthetic long-read barcode information to assemble local segments, integrating long reads to improve structural variant detection accuracy. We evaluated Blackbird on real human genome datasets. On the HG002 Genome in a Bottle (GIAB) benchmark, Blackbird in hybrid mode demonstrated results comparable to state-of-the-art long-read tools, while using less long-read coverage. Blackbird requires only 5 <math><mo>×</mo></math> coverage to achieve F1-scores (0.835 and 0.808 for deletions and insertions) similar to PBSV and Sniffles2 using 10 <math><mo>×</mo></math> PacBio Hi-Fi long-read coverage.</p><p><strong>Availability and implementation: </strong>Blackbird is available at https://github.com/1dayac/Blackbird.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf151\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12237510/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf151\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

动机:最近的基准测试表明,大多数结构变异,特别是在50-10,000 bp范围内,不能用短读测序来解决,但长读结构变异调用者在相同的数据集上表现更好。然而,高覆盖率的长读测序是昂贵的,并且需要大量的输入DNA。降低覆盖范围可以降低成本,但会显著影响现有结构变化(SV)呼叫者的性能。综合长读技术以较低的成本提供远程信息,但将其应用于50kbp以下的sv仍然具有挑战性。结果:我们提出了一种新的基于对齐和局部装配的混合算法Blackbird,该算法使用合成长读取和低覆盖长读取来提高结构变异检测。黑鸟不依赖全基因组组装,而是使用滑动窗口方法和合成的长读条形码信息来组装局部片段,整合长读,提高结构变异检测的准确性。我们在真实的人类基因组数据集上评估了黑鸟。在HG002瓶中基因组(GIAB)基准测试中,黑鸟在混合模式下展示了与最先进的长读工具相当的结果,同时使用较少的长读覆盖。Blackbird只需要5倍的覆盖率就可以获得f1分数(缺失和插入分别为0.835和0.808),类似于使用10倍PacBio Hi-Fi长读覆盖率的PBSV和Sniffles2。可用性和实现:Blackbird可在https://github.com/1dayac/Blackbird上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Blackbird: structural variant detection using synthetic and low-coverage long-reads.

Blackbird: structural variant detection using synthetic and low-coverage long-reads.

Blackbird: structural variant detection using synthetic and low-coverage long-reads.

Blackbird: structural variant detection using synthetic and low-coverage long-reads.

Motivation: Recent benchmarks show that most structural variations, especially within 50-10,000 bp range cannot be resolved with short-read sequencing, but long-read structural variant callers perform better on the same datasets. However, high-coverage long-read sequencing is costly and requires substantial input DNA. Reducing coverage lowers cost but significantly impacts the performance of existing structural variation (SV) callers. Synthetic long-read technologies offer long-range information at lower cost, but leveraging them for SVs under 50 kbp remains challenging.

Results: We propose a novel hybrid alignment- and local-assembly-based algorithm, Blackbird, that uses synthetic long reads and low-coverage long reads to improve structural variant detection. Instead of relying on whole-genome assembly, Blackbird uses a sliding window approach and synthetic long-read barcode information to assemble local segments, integrating long reads to improve structural variant detection accuracy. We evaluated Blackbird on real human genome datasets. On the HG002 Genome in a Bottle (GIAB) benchmark, Blackbird in hybrid mode demonstrated results comparable to state-of-the-art long-read tools, while using less long-read coverage. Blackbird requires only 5 × coverage to achieve F1-scores (0.835 and 0.808 for deletions and insertions) similar to PBSV and Sniffles2 using 10 × PacBio Hi-Fi long-read coverage.

Availability and implementation: Blackbird is available at https://github.com/1dayac/Blackbird.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信