BVSim: A benchmarking variation simulator mimicking human variation spectrum.

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Yongyi Luo, Zhen Zhang, Shu Wang, Jiandong Shi, Jingyu Hao, Sheng Lian, Taobo Hu, Toyotaka Ishibashi, Depeng Wang, Weichuan Yu, Xiaodan Fan
{"title":"BVSim: A benchmarking variation simulator mimicking human variation spectrum.","authors":"Yongyi Luo, Zhen Zhang, Shu Wang, Jiandong Shi, Jingyu Hao, Sheng Lian, Taobo Hu, Toyotaka Ishibashi, Depeng Wang, Weichuan Yu, Xiaodan Fan","doi":"10.1093/gigascience/giaf095","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Genomic variations, including single-nucleotide polymorphisms, small insertions and deletions, and structural variations, are crucial for understanding evolution and disease. However, comprehensive simulation tools for benchmarking genomic analysis methods are lacking. Existing simulators do not accurately represent the nonuniform distribution and length patterns of structural variations in human genomes, and simulating complex structural variations remains challenging.</p><p><strong>Results: </strong>We present BVSim, a flexible tool that provides probabilistic simulations of genomic variations, primarily focusing on human patterns while accommodating diverse species. BVSim effectively simulates both simple and complex structural variations and small variants by mimicking real-life variation distributions, which often exhibit higher frequencies near telomeres and within tandem repeat regions. Notably, BVSim allows users to input single or multiple benchmark samples from any reference genome, enabling the tool to summarize and represent the unique distribution patterns of structural variation positions and lengths specific to those species. Its compatibility with standard file formats facilitates seamless integration into various genomic research workflows, making it a very useful resource for benchmarking downstream tools such as variant callers. With numerical experiments, we show that BVSim generated more realistic sequences significantly different from other simulators' outputs.</p><p><strong>Conclusions: </strong>BVSim is written in Python and freely available to noncommercial users under the GPL3 license. Source code, application guide, and toy examples are provided on the GitHub page at https://github.com/YongyiLuo98/BVSim. The tool is registered in SciCrunch (RRID:SCR_026926), bio.tools (biotools:BVSim), and WorkflowHub (doi:10.48546/WORKFLOWHUB.WORKFLOW.1361.1).</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12398280/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf095","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Genomic variations, including single-nucleotide polymorphisms, small insertions and deletions, and structural variations, are crucial for understanding evolution and disease. However, comprehensive simulation tools for benchmarking genomic analysis methods are lacking. Existing simulators do not accurately represent the nonuniform distribution and length patterns of structural variations in human genomes, and simulating complex structural variations remains challenging.

Results: We present BVSim, a flexible tool that provides probabilistic simulations of genomic variations, primarily focusing on human patterns while accommodating diverse species. BVSim effectively simulates both simple and complex structural variations and small variants by mimicking real-life variation distributions, which often exhibit higher frequencies near telomeres and within tandem repeat regions. Notably, BVSim allows users to input single or multiple benchmark samples from any reference genome, enabling the tool to summarize and represent the unique distribution patterns of structural variation positions and lengths specific to those species. Its compatibility with standard file formats facilitates seamless integration into various genomic research workflows, making it a very useful resource for benchmarking downstream tools such as variant callers. With numerical experiments, we show that BVSim generated more realistic sequences significantly different from other simulators' outputs.

Conclusions: BVSim is written in Python and freely available to noncommercial users under the GPL3 license. Source code, application guide, and toy examples are provided on the GitHub page at https://github.com/YongyiLuo98/BVSim. The tool is registered in SciCrunch (RRID:SCR_026926), bio.tools (biotools:BVSim), and WorkflowHub (doi:10.48546/WORKFLOWHUB.WORKFLOW.1361.1).

BVSim:模拟人类变异谱的基准变异模拟器。
背景:基因组变异,包括单核苷酸多态性、小插入和缺失以及结构变异,对于理解进化和疾病至关重要。然而,缺乏全面的模拟工具来对标基因组分析方法。现有的模拟器不能准确地表示人类基因组结构变异的非均匀分布和长度模式,并且模拟复杂的结构变异仍然具有挑战性。结果:我们提出了BVSim,一个灵活的工具,提供基因组变异的概率模拟,主要关注人类模式,同时适应不同物种。BVSim通过模拟现实生活中的变异分布,有效地模拟了简单和复杂的结构变异和小变异,这些变异通常在端粒附近和串联重复区域内表现出更高的频率。值得注意的是,BVSim允许用户从任何参考基因组中输入单个或多个基准样本,使该工具能够总结和表示这些物种特有的结构变异位置和长度的独特分布模式。它与标准文件格式的兼容性促进了与各种基因组研究工作流程的无缝集成,使其成为对下游工具(如变体调用器)进行基准测试的非常有用的资源。通过数值实验,我们证明了BVSim生成的序列比其他模拟器的输出更真实。结论:BVSim是用Python编写的,并且在GPL3许可下免费提供给非商业用户。源代码、应用指南和玩具示例在GitHub页面https://github.com/YongyiLuo98/BVSim上提供。该工具注册在SciCrunch (RRID:SCR_026926), bio。tools (biotools:BVSim)和workflowwhub (doi:10.48546/ workflowwhub . workflow .1361.1)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信