High-Risk Sequence Prediction Model in DNA Storage: The LQSF Method.

IF 3.7 4区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Yitong Ma, Shuai Chen, Xu Qi, Zuhong Lu, Kun Bi
{"title":"High-Risk Sequence Prediction Model in DNA Storage: The LQSF Method.","authors":"Yitong Ma, Shuai Chen, Xu Qi, Zuhong Lu, Kun Bi","doi":"10.1109/TNB.2024.3424576","DOIUrl":null,"url":null,"abstract":"<p><p>Traditional DNA storage technologies rely on passive filtering methods for error correction during synthesis and sequencing, which result in redundancy and inadequate error correction. Addressing this, the Low Quality Sequence Filter (LQSF) was introduced, an innovative method employing deep learning models to predict high-risk sequences. The LQSF approach leverages a classification model trained on error-prone sequences, enabling efficient pre-sequencing filtration of low-quality sequences and reducing time and resources in subsequent stages. Analysis has demonstrated a clear distinction between high and low-quality sequences, confirming the efficacy of the LQSF method. Extensive training and testing were conducted across various neural networks and test sets. The results showed all models achieving an AUC value above 0.91 on ROC curves and over 0.95 on PR curves across different datasets. Notably, models such as Alexnet, VGG16, and VGG19 achieved a perfect AUC of 1.0 on the Original dataset, highlighting their precision in classification. Further validation using Illumina sequencing data substantiated a strong correlation between model scores and sequence error-proneness, emphasizing the model's applicability. The LQSF method marks a significant advancement in DNA storage technology, introducing active sequence filtering at the encoding stage. This pioneering approach holds substantial promise for future DNA storage research and applications.</p>","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"PP ","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1109/TNB.2024.3424576","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Traditional DNA storage technologies rely on passive filtering methods for error correction during synthesis and sequencing, which result in redundancy and inadequate error correction. Addressing this, the Low Quality Sequence Filter (LQSF) was introduced, an innovative method employing deep learning models to predict high-risk sequences. The LQSF approach leverages a classification model trained on error-prone sequences, enabling efficient pre-sequencing filtration of low-quality sequences and reducing time and resources in subsequent stages. Analysis has demonstrated a clear distinction between high and low-quality sequences, confirming the efficacy of the LQSF method. Extensive training and testing were conducted across various neural networks and test sets. The results showed all models achieving an AUC value above 0.91 on ROC curves and over 0.95 on PR curves across different datasets. Notably, models such as Alexnet, VGG16, and VGG19 achieved a perfect AUC of 1.0 on the Original dataset, highlighting their precision in classification. Further validation using Illumina sequencing data substantiated a strong correlation between model scores and sequence error-proneness, emphasizing the model's applicability. The LQSF method marks a significant advancement in DNA storage technology, introducing active sequence filtering at the encoding stage. This pioneering approach holds substantial promise for future DNA storage research and applications.

DNA 储存中的高风险序列预测模型:LQSF 方法
传统的 DNA 存储技术依赖被动过滤方法在合成和测序过程中进行纠错,这导致了冗余和不充分的纠错。针对这一问题,推出了低质量序列过滤器(LQSF),这是一种采用深度学习模型预测高风险序列的创新方法。LQSF 方法利用在易出错序列上训练的分类模型,实现了对低质量序列的高效预序列过滤,减少了后续阶段的时间和资源。分析表明,高质量和低质量序列之间有明显的区别,证实了 LQSF 方法的有效性。对各种神经网络和测试集进行了广泛的训练和测试。结果显示,在不同数据集上,所有模型的 ROC 曲线 AUC 值均超过 0.91,PR 曲线 AUC 值均超过 0.95。值得注意的是,Alexnet、VGG16 和 VGG19 等模型在原始数据集上的 AUC 值达到了完美的 1.0,突出了它们的分类精度。使用 Illumina 测序数据进行的进一步验证证实了模型得分与序列错误率之间的强相关性,强调了模型的适用性。LQSF 方法标志着 DNA 储存技术的重大进步,它在编码阶段引入了主动序列过滤技术。这种开创性的方法为未来的 DNA 存储研究和应用带来了巨大的希望。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on NanoBioscience
IEEE Transactions on NanoBioscience 工程技术-纳米科技
CiteScore
7.00
自引率
5.10%
发文量
197
审稿时长
>12 weeks
期刊介绍: The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信