基于前馈变压器的纳米孔序列信号端到端模拟。

Denis Beslic, Martin Kucklick, Susanne Engelmann, Stephan Fuchs, Bernhard Y Renard, Nils Körber
{"title":"基于前馈变压器的纳米孔序列信号端到端模拟。","authors":"Denis Beslic, Martin Kucklick, Susanne Engelmann, Stephan Fuchs, Bernhard Y Renard, Nils Körber","doi":"10.1093/bioinformatics/btae744","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Nanopore sequencing represents a significant advancement in genomics, enabling direct long-read DNA sequencing at the single-molecule level. Accurate simulation of nanopore sequencing signals from nucleotide sequences is crucial for method development and for complementing experimental data. Most existing approaches rely on predefined statistical models, which may not adequately capture the properties of experimental signal data. Furthermore, these simulators were developed for earlier versions of nanopore chemistry, which limits their applicability and adaptability to the latest flow cell data.</p><p><strong>Results: </strong>To enhance the quality of artificial signals, we introduce seq2squiggle, a novel transformer-based, non-autoregressive model designed to generate nanopore sequencing signals from nucleotide sequences. Unlike existing simulators that rely on static k-mer models, our approach learns sequential contextual information from segmented signal data. We benchmark seq2squiggle against state-of-the-art simulators on real experimental R9.4.1 and R10.4.1 data, evaluating signal similarity, basecalling accuracy, and variant detection rates. Seq2squiggle consistently outperforms existing tools across multiple datasets, demonstrating superior similarity to real data and offering a robust solution for simulating nanopore sequencing signals with the latest flow cell generation.</p><p><strong>Availability and implementation: </strong>seq2squiggle is freely available on GitHub at: github.com/ZKI-PH-ImageAnalysis/seq2squiggle.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729726/pdf/","citationCount":"0","resultStr":"{\"title\":\"End-to-end simulation of nanopore sequencing signals with feed-forward transformers.\",\"authors\":\"Denis Beslic, Martin Kucklick, Susanne Engelmann, Stephan Fuchs, Bernhard Y Renard, Nils Körber\",\"doi\":\"10.1093/bioinformatics/btae744\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Nanopore sequencing represents a significant advancement in genomics, enabling direct long-read DNA sequencing at the single-molecule level. Accurate simulation of nanopore sequencing signals from nucleotide sequences is crucial for method development and for complementing experimental data. Most existing approaches rely on predefined statistical models, which may not adequately capture the properties of experimental signal data. Furthermore, these simulators were developed for earlier versions of nanopore chemistry, which limits their applicability and adaptability to the latest flow cell data.</p><p><strong>Results: </strong>To enhance the quality of artificial signals, we introduce seq2squiggle, a novel transformer-based, non-autoregressive model designed to generate nanopore sequencing signals from nucleotide sequences. Unlike existing simulators that rely on static k-mer models, our approach learns sequential contextual information from segmented signal data. We benchmark seq2squiggle against state-of-the-art simulators on real experimental R9.4.1 and R10.4.1 data, evaluating signal similarity, basecalling accuracy, and variant detection rates. Seq2squiggle consistently outperforms existing tools across multiple datasets, demonstrating superior similarity to real data and offering a robust solution for simulating nanopore sequencing signals with the latest flow cell generation.</p><p><strong>Availability and implementation: </strong>seq2squiggle is freely available on GitHub at: github.com/ZKI-PH-ImageAnalysis/seq2squiggle.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729726/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btae744\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动机:纳米孔测序代表了基因组学的重大进步,使单分子水平上的直接长读DNA测序成为可能。精确模拟核苷酸序列的纳米孔测序信号对于方法开发和补充实验数据至关重要。大多数现有的方法依赖于预定义的统计模型,这可能不能充分捕捉实验信号数据的特性。此外,这些模拟器是为早期版本的纳米孔化学开发的,这限制了它们对最新流动电池数据的适用性和适应性。结果:为了提高人工信号的质量,我们引入了seq2squiggle,这是一种基于变压器的非自回归模型,旨在从核苷酸序列中产生纳米孔测序信号。与现有的依赖静态k-mer模型的模拟器不同,我们的方法从分段信号数据中学习顺序上下文信息。我们将seq2squiggle与最先进的模拟器在真实实验R9.4.1和R10.4.1数据上进行基准测试,评估信号相似度,基本调用准确性和变体检测率。Seq2squiggle在多个数据集上始终优于现有工具,展示了与真实数据的卓越相似性,并为模拟最新一代流动池的纳米孔测序信号提供了强大的解决方案。可用性:seq2squiggle在GitHub上免费提供:github.com/ZKI-PH-ImageAnalysis/seq2squiggle.Supplementary信息:补充数据可在Bioinformatics在线获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
End-to-end simulation of nanopore sequencing signals with feed-forward transformers.

Motivation: Nanopore sequencing represents a significant advancement in genomics, enabling direct long-read DNA sequencing at the single-molecule level. Accurate simulation of nanopore sequencing signals from nucleotide sequences is crucial for method development and for complementing experimental data. Most existing approaches rely on predefined statistical models, which may not adequately capture the properties of experimental signal data. Furthermore, these simulators were developed for earlier versions of nanopore chemistry, which limits their applicability and adaptability to the latest flow cell data.

Results: To enhance the quality of artificial signals, we introduce seq2squiggle, a novel transformer-based, non-autoregressive model designed to generate nanopore sequencing signals from nucleotide sequences. Unlike existing simulators that rely on static k-mer models, our approach learns sequential contextual information from segmented signal data. We benchmark seq2squiggle against state-of-the-art simulators on real experimental R9.4.1 and R10.4.1 data, evaluating signal similarity, basecalling accuracy, and variant detection rates. Seq2squiggle consistently outperforms existing tools across multiple datasets, demonstrating superior similarity to real data and offering a robust solution for simulating nanopore sequencing signals with the latest flow cell generation.

Availability and implementation: seq2squiggle is freely available on GitHub at: github.com/ZKI-PH-ImageAnalysis/seq2squiggle.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信