DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features.

IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Shumei Ding, Jia Zheng, Cangzhi Jia
{"title":"DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features.","authors":"Shumei Ding, Jia Zheng, Cangzhi Jia","doi":"10.1093/bfgp/elae043","DOIUrl":null,"url":null,"abstract":"<p><p>The CRISPR/Cas9 system developed from Streptococcus pyogenes (SpCas9) has high potential in gene editing. However, its successful application is hindered by the considerable variability in target efficiencies across different single guide RNAs (sgRNAs). Although several deep learning models have been created to predict sgRNA on-target activity, the intrinsic mechanisms of these models are difficult to explain, and there is still scope for improvement in prediction performance. To overcome these issues, we propose an ensemble interpretable model termed DeepMEns based on deep learning to predict sgRNA on-target activity. By using five different training and validation datasets, we constructed five sub-regressors, each comprising three parts. The first part uses one-hot encoding, wherein 0-1 representation of the secondary structure is used as the input to the convolutional neural network (CNN) with Transformer encoder. The second part uses the DNA shape feature matrix as the input to the CNN with Transformer encoder. The third part uses positional encoding feature matrices as the proposed input into a long short-term memory network with an attention mechanism. These three parts are concatenated through the flattened layer, and the final prediction result is the average of the five sub-regressors. Extensive benchmarking experiments indicated that DeepMEns achieved the highest Spearman correlation coefficient for 6 of 10 independent test datasets as compared to previous predictors, this finding confirmed that DeepMEns can accomplish state-of-the-art performance. Moreover, the ablation analysis also indicated that the ensemble strategy may improve the performance of the prediction model.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in Functional Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bfgp/elae043","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The CRISPR/Cas9 system developed from Streptococcus pyogenes (SpCas9) has high potential in gene editing. However, its successful application is hindered by the considerable variability in target efficiencies across different single guide RNAs (sgRNAs). Although several deep learning models have been created to predict sgRNA on-target activity, the intrinsic mechanisms of these models are difficult to explain, and there is still scope for improvement in prediction performance. To overcome these issues, we propose an ensemble interpretable model termed DeepMEns based on deep learning to predict sgRNA on-target activity. By using five different training and validation datasets, we constructed five sub-regressors, each comprising three parts. The first part uses one-hot encoding, wherein 0-1 representation of the secondary structure is used as the input to the convolutional neural network (CNN) with Transformer encoder. The second part uses the DNA shape feature matrix as the input to the CNN with Transformer encoder. The third part uses positional encoding feature matrices as the proposed input into a long short-term memory network with an attention mechanism. These three parts are concatenated through the flattened layer, and the final prediction result is the average of the five sub-regressors. Extensive benchmarking experiments indicated that DeepMEns achieved the highest Spearman correlation coefficient for 6 of 10 independent test datasets as compared to previous predictors, this finding confirmed that DeepMEns can accomplish state-of-the-art performance. Moreover, the ablation analysis also indicated that the ensemble strategy may improve the performance of the prediction model.

DeepMEns:基于多种特征预测 sgRNA 靶向活性的集合模型。
从化脓性链球菌(SpCas9)中开发的 CRISPR/Cas9 系统在基因编辑方面具有很大的潜力。然而,不同的单导RNA(sgRNA)在靶标效率上存在很大差异,这阻碍了它的成功应用。虽然已经创建了几个深度学习模型来预测 sgRNA 的靶上活性,但这些模型的内在机制难以解释,预测性能仍有改进的余地。为了克服这些问题,我们提出了一种基于深度学习的集合可解释模型,称为 DeepMEns,用于预测 sgRNA 靶向活性。通过使用五个不同的训练和验证数据集,我们构建了五个子回归器,每个子回归器由三部分组成。第一部分使用单次编码,其中二级结构的 0-1 表示被用作带有 Transformer 编码器的卷积神经网络(CNN)的输入。第二部分使用 DNA 形状特征矩阵作为带变换器编码器的卷积神经网络的输入。第三部分使用位置编码特征矩阵作为具有注意力机制的长短期记忆网络的拟议输入。这三个部分通过扁平化层进行串联,最终预测结果是五个子回归器的平均值。广泛的基准测试实验表明,在 10 个独立测试数据集中,DeepMEns 有 6 个数据集的斯皮尔曼相关系数与之前的预测器相比最高,这一结果证实了 DeepMEns 可以达到最先进的性能。此外,消融分析还表明,集合策略可以提高预测模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in Functional Genomics
Briefings in Functional Genomics BIOTECHNOLOGY & APPLIED MICROBIOLOGY-GENETICS & HEREDITY
CiteScore
6.30
自引率
2.50%
发文量
37
审稿时长
6-12 weeks
期刊介绍: Briefings in Functional Genomics publishes high quality peer reviewed articles that focus on the use, development or exploitation of genomic approaches, and their application to all areas of biological research. As well as exploring thematic areas where these techniques and protocols are being used, articles review the impact that these approaches have had, or are likely to have, on their field. Subjects covered by the Journal include but are not restricted to: the identification and functional characterisation of coding and non-coding features in genomes, microarray technologies, gene expression profiling, next generation sequencing, pharmacogenomics, phenomics, SNP technologies, transgenic systems, mutation screens and genotyping. Articles range in scope and depth from the introductory level to specific details of protocols and analyses, encompassing bacterial, fungal, plant, animal and human data. The editorial board welcome the submission of review articles for publication. Essential criteria for the publication of papers is that they do not contain primary data, and that they are high quality, clearly written review articles which provide a balanced, highly informative and up to date perspective to researchers in the field of functional genomics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信