Design and structure of overlapping regions in PCA via deep learning

IF 4.4 2区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Yan Zheng , Xi-Chen Cui , Fei Guo , Ming-Liang Dou , Ze-Xiong Xie , Ying-Jin Yuan
{"title":"Design and structure of overlapping regions in PCA via deep learning","authors":"Yan Zheng ,&nbsp;Xi-Chen Cui ,&nbsp;Fei Guo ,&nbsp;Ming-Liang Dou ,&nbsp;Ze-Xiong Xie ,&nbsp;Ying-Jin Yuan","doi":"10.1016/j.synbio.2024.12.007","DOIUrl":null,"url":null,"abstract":"<div><div>Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and construct in the genome synthesis. Here we proposed a deep learning model based on extensive synthesis data to discern latent sequence representations in overlapping regions with an AUPR of 0.805. Utilizing the model, we developed the SmartCut algorithm aimed at designing oligonucleotides and enhancing the success rate of PCA experiments. This algorithm was successfully applied to sequences with diverse synthesis constraints, 80.4 % of which were synthesized in a single round. We further discovered structure differences represented by major groove width, stagger, slide, and centroid distance between overlapping and non-overlapping regions, which elucidated the model's reasonableness through the lens of physical chemistry. This comprehensive approach facilitates streamlined and efficient investigations into the genome synthesis.</div></div>","PeriodicalId":22148,"journal":{"name":"Synthetic and Systems Biotechnology","volume":"10 2","pages":"Pages 442-451"},"PeriodicalIF":4.4000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthetic and Systems Biotechnology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2405805X24001595","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and construct in the genome synthesis. Here we proposed a deep learning model based on extensive synthesis data to discern latent sequence representations in overlapping regions with an AUPR of 0.805. Utilizing the model, we developed the SmartCut algorithm aimed at designing oligonucleotides and enhancing the success rate of PCA experiments. This algorithm was successfully applied to sequences with diverse synthesis constraints, 80.4 % of which were synthesized in a single round. We further discovered structure differences represented by major groove width, stagger, slide, and centroid distance between overlapping and non-overlapping regions, which elucidated the model's reasonableness through the lens of physical chemistry. This comprehensive approach facilitates streamlined and efficient investigations into the genome synthesis.
基于深度学习的PCA重叠区域设计与结构
聚合酶循环组装(PCA)是合成千碱基长度DNA片段的主要方法。重叠区域的设计是影响合成成功率的核心因素。然而,在基因组合成中仍然存在着设计和构建具有挑战性的DNA序列。本文提出了一种基于大量综合数据的深度学习模型来识别重叠区域的潜在序列表示,AUPR为0.805。利用该模型,我们开发了旨在设计寡核苷酸和提高主成分分析实验成功率的SmartCut算法。该算法成功地应用于具有多种合成约束的序列,其中80.4%的序列是单轮合成的。我们进一步发现重叠区域与非重叠区域之间以主槽宽度、交错、滑动和质心距离为代表的结构差异,从物理化学的角度阐明了模型的合理性。这种全面的方法促进了对基因组合成的简化和有效的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Synthetic and Systems Biotechnology
Synthetic and Systems Biotechnology BIOTECHNOLOGY & APPLIED MICROBIOLOGY-
CiteScore
6.90
自引率
12.50%
发文量
90
审稿时长
67 days
期刊介绍: Synthetic and Systems Biotechnology aims to promote the communication of original research in synthetic and systems biology, with strong emphasis on applications towards biotechnology. This journal is a quarterly peer-reviewed journal led by Editor-in-Chief Lixin Zhang. The journal publishes high-quality research; focusing on integrative approaches to enable the understanding and design of biological systems, and research to develop the application of systems and synthetic biology to natural systems. This journal will publish Articles, Short notes, Methods, Mini Reviews, Commentary and Conference reviews.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信