连接：计算重建从长牛津纳米孔读取连接片段。

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-07-24 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0321246

Alexander J Petri, Mai Thi-Huyen Nguyen, Anjali Rajwar, Erik Benson, Kristoffer Sahlin

{"title":"连接：计算重建从长牛津纳米孔读取连接片段。","authors":"Alexander J Petri, Mai Thi-Huyen Nguyen, Anjali Rajwar, Erik Benson, Kristoffer Sahlin","doi":"10.1371/journal.pone.0321246","DOIUrl":null,"url":null,"abstract":"Synthetic combinatorial DNA libraries are widely used to produce protein variants, optimize binders, and for high-throughput studies of protein-DNA interactions. The libraries can be made by researchers or vendors, and high-throughput sequencing is used for both quality control and to study the outcome of selection experiments. Oxford nanopore sequencing (ONT) is well suited to this as it allows for long read lengths and can be done rapidly with low-cost instrumentation. However, it suffers from a lower overall read accuracy and an uneven error profile. No current bioinformatics tools are well-suited to the challenge of deducing the composition and order of constituent members of combinatorial libraries from ONT reads. We introduce cONcat, an algorithm to identify the makeup of concatenated DNA fragments in a set of ONT sequencing reads from a pool of known fragments. cONcat uses an edit distance-based recursive covering algorithm for finding the best possible matchings between the fragments and the reads. In our experiments on simulated and experimental data, cONcat accurately detects the correct fragment coverings given the short fragment sizes (< 20 bp) and the sequencing errors present in ONT reads. However, we find that the high error rates in the start of ONT reads make it challenging to get confident coverage there, inferring a need for experimental strategies to avoid key sequence information in the start of reads.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 7","pages":"e0321246"},"PeriodicalIF":2.6000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289010/pdf/","citationCount":"0","resultStr":"{\"title\":\"cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads.\",\"authors\":\"Alexander J Petri, Mai Thi-Huyen Nguyen, Anjali Rajwar, Erik Benson, Kristoffer Sahlin\",\"doi\":\"10.1371/journal.pone.0321246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Synthetic combinatorial DNA libraries are widely used to produce protein variants, optimize binders, and for high-throughput studies of protein-DNA interactions. The libraries can be made by researchers or vendors, and high-throughput sequencing is used for both quality control and to study the outcome of selection experiments. Oxford nanopore sequencing (ONT) is well suited to this as it allows for long read lengths and can be done rapidly with low-cost instrumentation. However, it suffers from a lower overall read accuracy and an uneven error profile. No current bioinformatics tools are well-suited to the challenge of deducing the composition and order of constituent members of combinatorial libraries from ONT reads. We introduce cONcat, an algorithm to identify the makeup of concatenated DNA fragments in a set of ONT sequencing reads from a pool of known fragments. cONcat uses an edit distance-based recursive covering algorithm for finding the best possible matchings between the fragments and the reads. In our experiments on simulated and experimental data, cONcat accurately detects the correct fragment coverings given the short fragment sizes (< 20 bp) and the sequencing errors present in ONT reads. However, we find that the high error rates in the start of ONT reads make it challenging to get confident coverage there, inferring a need for experimental strategies to avoid key sequence information in the start of reads.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 7\",\"pages\":\"e0321246\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289010/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0321246\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0321246","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

合成组合DNA文库广泛用于产生蛋白质变体，优化结合物以及蛋白质-DNA相互作用的高通量研究。库可以由研究人员或供应商制作，高通量测序用于质量控制和研究选择实验的结果。牛津纳米孔测序（ONT）非常适合这种情况，因为它允许较长的读取长度，并且可以用低成本的仪器快速完成。然而，它的缺点是总体读取精度较低，错误分布不均匀。目前没有生物信息学工具很好地适应从ONT读取推断组合文库组成成员的组成和顺序的挑战。我们介绍cONcat，一种从已知片段池中识别一组ONT测序读数中串联DNA片段组成的算法。cONcat使用基于编辑距离的递归覆盖算法来查找片段和读取之间的最佳匹配。在我们对模拟和实验数据的实验中，考虑到短片段大小（< 20 bp）和ONT reads中存在的测序错误，cONcat准确地检测到正确的片段覆盖。然而，我们发现在ONT读取开始时的高错误率使得在那里获得可靠的覆盖具有挑战性，这推断需要实验策略来避免读取开始时的关键序列信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads.

查看原文本刊更多论文

cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads.

Synthetic combinatorial DNA libraries are widely used to produce protein variants, optimize binders, and for high-throughput studies of protein-DNA interactions. The libraries can be made by researchers or vendors, and high-throughput sequencing is used for both quality control and to study the outcome of selection experiments. Oxford nanopore sequencing (ONT) is well suited to this as it allows for long read lengths and can be done rapidly with low-cost instrumentation. However, it suffers from a lower overall read accuracy and an uneven error profile. No current bioinformatics tools are well-suited to the challenge of deducing the composition and order of constituent members of combinatorial libraries from ONT reads. We introduce cONcat, an algorithm to identify the makeup of concatenated DNA fragments in a set of ONT sequencing reads from a pool of known fragments. cONcat uses an edit distance-based recursive covering algorithm for finding the best possible matchings between the fragments and the reads. In our experiments on simulated and experimental data, cONcat accurately detects the correct fragment coverings given the short fragment sizes (< 20 bp) and the sequencing errors present in ONT reads. However, we find that the high error rates in the start of ONT reads make it challenging to get confident coverage there, inferring a need for experimental strategies to avoid key sequence information in the start of reads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage