利用BERT嵌入的双向LSTM预测CRISPR-Cas9在人原代细胞中的脱靶效应

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2024-12-30 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbae184
Orhan Sari, Ziying Liu, Youlian Pan, Xiaojian Shao
{"title":"利用BERT嵌入的双向LSTM预测CRISPR-Cas9在人原代细胞中的脱靶效应","authors":"Orhan Sari, Ziying Liu, Youlian Pan, Xiaojian Shao","doi":"10.1093/bioadv/vbae184","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system is a ground-breaking genome editing tool, which has revolutionized cell and gene therapies. One of the essential components involved in this system that ensures its success is the design of an optimal single-guide RNA (sgRNA) with high on-target cleavage efficiency and low off-target effects. This is challenging as many conditions need to be considered, and empirically testing every design is time-consuming and costly. <i>In silico</i> prediction using machine learning models provides high-performance alternatives.</p><p><strong>Results: </strong>We present CrisprBERT, a deep learning model incorporating a Bidirectional Encoder Representations from Transformers (BERT) architecture to provide a high-dimensional embedding for paired sgRNA and DNA sequences and Bidirectional Long Short-term Memory networks for learning, to predict the off-target effects of sgRNAs utilizing only the sgRNAs and their paired DNA sequences. We proposed doublet stack encoding to capture the local energy configuration of the Cas9 binding and applied the BERT model to learn the contextual embedding of the doublet pairs. Our results showed that the new model achieved better performance than state-of-the-art deep learning models regarding single split and leave-one-sgRNA-out cross-validations as well as independent testing.</p><p><strong>Availability and implementation: </strong>The CrisprBERT is available at GitHub: https://github.com/OSsari/CrisprBERT.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae184"},"PeriodicalIF":2.4000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696696/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting CRISPR-Cas9 off-target effects in human primary cells using bidirectional LSTM with BERT embedding.\",\"authors\":\"Orhan Sari, Ziying Liu, Youlian Pan, Xiaojian Shao\",\"doi\":\"10.1093/bioadv/vbae184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system is a ground-breaking genome editing tool, which has revolutionized cell and gene therapies. One of the essential components involved in this system that ensures its success is the design of an optimal single-guide RNA (sgRNA) with high on-target cleavage efficiency and low off-target effects. This is challenging as many conditions need to be considered, and empirically testing every design is time-consuming and costly. <i>In silico</i> prediction using machine learning models provides high-performance alternatives.</p><p><strong>Results: </strong>We present CrisprBERT, a deep learning model incorporating a Bidirectional Encoder Representations from Transformers (BERT) architecture to provide a high-dimensional embedding for paired sgRNA and DNA sequences and Bidirectional Long Short-term Memory networks for learning, to predict the off-target effects of sgRNAs utilizing only the sgRNAs and their paired DNA sequences. We proposed doublet stack encoding to capture the local energy configuration of the Cas9 binding and applied the BERT model to learn the contextual embedding of the doublet pairs. Our results showed that the new model achieved better performance than state-of-the-art deep learning models regarding single split and leave-one-sgRNA-out cross-validations as well as independent testing.</p><p><strong>Availability and implementation: </strong>The CrisprBERT is available at GitHub: https://github.com/OSsari/CrisprBERT.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbae184\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696696/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbae184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbae184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

集群规则间隔短回文重复(CRISPR)-Cas9系统是一种突破性的基因组编辑工具,它彻底改变了细胞和基因治疗。确保该系统成功的重要组成部分之一是设计具有高靶向切割效率和低脱靶效应的最佳单导RNA (sgRNA)。这是具有挑战性的,因为需要考虑许多条件,并且对每个设计进行经验测试既耗时又昂贵。使用机器学习模型的计算机预测提供了高性能的替代方案。结果:我们提出了CrisprBERT,这是一个深度学习模型,结合了来自变形变压器的双向编码器表示(BERT)架构,为配对的sgRNA和DNA序列以及双向长短期记忆网络提供高维嵌入,用于学习,仅利用sgRNA及其配对的DNA序列来预测sgRNA的脱靶效应。我们提出了双重态堆栈编码来捕获Cas9结合的局部能量配置,并应用BERT模型来学习双重态对的上下文嵌入。我们的研究结果表明,新模型在单个分裂和留下一个sgrna的交叉验证以及独立测试方面取得了比最先进的深度学习模型更好的性能。可用性和实现:CrisprBERT可以在GitHub上获得:https://github.com/OSsari/CrisprBERT。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting CRISPR-Cas9 off-target effects in human primary cells using bidirectional LSTM with BERT embedding.

Motivation: Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system is a ground-breaking genome editing tool, which has revolutionized cell and gene therapies. One of the essential components involved in this system that ensures its success is the design of an optimal single-guide RNA (sgRNA) with high on-target cleavage efficiency and low off-target effects. This is challenging as many conditions need to be considered, and empirically testing every design is time-consuming and costly. In silico prediction using machine learning models provides high-performance alternatives.

Results: We present CrisprBERT, a deep learning model incorporating a Bidirectional Encoder Representations from Transformers (BERT) architecture to provide a high-dimensional embedding for paired sgRNA and DNA sequences and Bidirectional Long Short-term Memory networks for learning, to predict the off-target effects of sgRNAs utilizing only the sgRNAs and their paired DNA sequences. We proposed doublet stack encoding to capture the local energy configuration of the Cas9 binding and applied the BERT model to learn the contextual embedding of the doublet pairs. Our results showed that the new model achieved better performance than state-of-the-art deep learning models regarding single split and leave-one-sgRNA-out cross-validations as well as independent testing.

Availability and implementation: The CrisprBERT is available at GitHub: https://github.com/OSsari/CrisprBERT.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信