Dmitry Meleshko, Rui Yang, Salil Maharjan, David C Danko, Anton Korobeynikov, Iman Hajirasouliha
{"title":"黑鸟:结构变异检测使用合成和低覆盖长读取。","authors":"Dmitry Meleshko, Rui Yang, Salil Maharjan, David C Danko, Anton Korobeynikov, Iman Hajirasouliha","doi":"10.1093/bioadv/vbaf151","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Recent benchmarks show that most structural variations, especially within 50-10,000 bp range cannot be resolved with short-read sequencing, but long-read structural variant callers perform better on the same datasets. However, high-coverage long-read sequencing is costly and requires substantial input DNA. Reducing coverage lowers cost but significantly impacts the performance of existing structural variation (SV) callers. Synthetic long-read technologies offer long-range information at lower cost, but leveraging them for SVs under 50 kbp remains challenging.</p><p><strong>Results: </strong>We propose a novel hybrid alignment- and local-assembly-based algorithm, Blackbird, that uses synthetic long reads and low-coverage long reads to improve structural variant detection. Instead of relying on whole-genome assembly, Blackbird uses a sliding window approach and synthetic long-read barcode information to assemble local segments, integrating long reads to improve structural variant detection accuracy. We evaluated Blackbird on real human genome datasets. On the HG002 Genome in a Bottle (GIAB) benchmark, Blackbird in hybrid mode demonstrated results comparable to state-of-the-art long-read tools, while using less long-read coverage. Blackbird requires only 5 <math><mo>×</mo></math> coverage to achieve F1-scores (0.835 and 0.808 for deletions and insertions) similar to PBSV and Sniffles2 using 10 <math><mo>×</mo></math> PacBio Hi-Fi long-read coverage.</p><p><strong>Availability and implementation: </strong>Blackbird is available at https://github.com/1dayac/Blackbird.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf151"},"PeriodicalIF":2.8000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12237510/pdf/","citationCount":"0","resultStr":"{\"title\":\"Blackbird: structural variant detection using synthetic and low-coverage long-reads.\",\"authors\":\"Dmitry Meleshko, Rui Yang, Salil Maharjan, David C Danko, Anton Korobeynikov, Iman Hajirasouliha\",\"doi\":\"10.1093/bioadv/vbaf151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Recent benchmarks show that most structural variations, especially within 50-10,000 bp range cannot be resolved with short-read sequencing, but long-read structural variant callers perform better on the same datasets. However, high-coverage long-read sequencing is costly and requires substantial input DNA. Reducing coverage lowers cost but significantly impacts the performance of existing structural variation (SV) callers. Synthetic long-read technologies offer long-range information at lower cost, but leveraging them for SVs under 50 kbp remains challenging.</p><p><strong>Results: </strong>We propose a novel hybrid alignment- and local-assembly-based algorithm, Blackbird, that uses synthetic long reads and low-coverage long reads to improve structural variant detection. Instead of relying on whole-genome assembly, Blackbird uses a sliding window approach and synthetic long-read barcode information to assemble local segments, integrating long reads to improve structural variant detection accuracy. We evaluated Blackbird on real human genome datasets. On the HG002 Genome in a Bottle (GIAB) benchmark, Blackbird in hybrid mode demonstrated results comparable to state-of-the-art long-read tools, while using less long-read coverage. Blackbird requires only 5 <math><mo>×</mo></math> coverage to achieve F1-scores (0.835 and 0.808 for deletions and insertions) similar to PBSV and Sniffles2 using 10 <math><mo>×</mo></math> PacBio Hi-Fi long-read coverage.</p><p><strong>Availability and implementation: </strong>Blackbird is available at https://github.com/1dayac/Blackbird.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf151\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12237510/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf151\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Blackbird: structural variant detection using synthetic and low-coverage long-reads.
Motivation: Recent benchmarks show that most structural variations, especially within 50-10,000 bp range cannot be resolved with short-read sequencing, but long-read structural variant callers perform better on the same datasets. However, high-coverage long-read sequencing is costly and requires substantial input DNA. Reducing coverage lowers cost but significantly impacts the performance of existing structural variation (SV) callers. Synthetic long-read technologies offer long-range information at lower cost, but leveraging them for SVs under 50 kbp remains challenging.
Results: We propose a novel hybrid alignment- and local-assembly-based algorithm, Blackbird, that uses synthetic long reads and low-coverage long reads to improve structural variant detection. Instead of relying on whole-genome assembly, Blackbird uses a sliding window approach and synthetic long-read barcode information to assemble local segments, integrating long reads to improve structural variant detection accuracy. We evaluated Blackbird on real human genome datasets. On the HG002 Genome in a Bottle (GIAB) benchmark, Blackbird in hybrid mode demonstrated results comparable to state-of-the-art long-read tools, while using less long-read coverage. Blackbird requires only 5 coverage to achieve F1-scores (0.835 and 0.808 for deletions and insertions) similar to PBSV and Sniffles2 using 10 PacBio Hi-Fi long-read coverage.
Availability and implementation: Blackbird is available at https://github.com/1dayac/Blackbird.