Accurate Prediction of CRISPR/Cas13a Guide Activity Using Feature Selection and Deep Learning

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL
Jiashun Fu, Xuyang Liu, Ruijie Deng, Xiue Jiang, Wensheng Cai, Haohao Fu* and Xueguang Shao*, 
{"title":"Accurate Prediction of CRISPR/Cas13a Guide Activity Using Feature Selection and Deep Learning","authors":"Jiashun Fu,&nbsp;Xuyang Liu,&nbsp;Ruijie Deng,&nbsp;Xiue Jiang,&nbsp;Wensheng Cai,&nbsp;Haohao Fu* and Xueguang Shao*,&nbsp;","doi":"10.1021/acs.jcim.4c0243810.1021/acs.jcim.4c02438","DOIUrl":null,"url":null,"abstract":"<p >CRISPR/Cas13a serves as a key tool for nucleic acid tests; therefore, accurate prediction of its activity is essential for creating robust and sensitive diagnosis. In this study, we create a dual-branch neural network model that achieves high prediction accuracy and classification performance across two independent CRISPR/Cas13a data sets, outperforming previously published models relying solely on sequence features. The model integrates direct sequence encoding with descriptive features and yields 99 key descriptive features out of 1553, extracted through statistical analysis, which critically influence guide–target interactions and Cas13a guide activity. By employing Shapley Additive Explanations and Integrated Gradients for feature importance analysis, we show that sequence composition, mismatch type and frequency, and the protospacer flanking site region are primary features. These findings underscore the importance of using descriptive features as complementary inputs to deep learning-based encoding and provide valuable insights into the mechanisms underlying guide–target interaction. All in all, this study not only introduces a reliable and efficient model for Cas13a guide activity prediction but also offers a foundation for future rational design efforts.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 7","pages":"3380–3387 3380–3387"},"PeriodicalIF":5.3000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.4c02438","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

CRISPR/Cas13a serves as a key tool for nucleic acid tests; therefore, accurate prediction of its activity is essential for creating robust and sensitive diagnosis. In this study, we create a dual-branch neural network model that achieves high prediction accuracy and classification performance across two independent CRISPR/Cas13a data sets, outperforming previously published models relying solely on sequence features. The model integrates direct sequence encoding with descriptive features and yields 99 key descriptive features out of 1553, extracted through statistical analysis, which critically influence guide–target interactions and Cas13a guide activity. By employing Shapley Additive Explanations and Integrated Gradients for feature importance analysis, we show that sequence composition, mismatch type and frequency, and the protospacer flanking site region are primary features. These findings underscore the importance of using descriptive features as complementary inputs to deep learning-based encoding and provide valuable insights into the mechanisms underlying guide–target interaction. All in all, this study not only introduces a reliable and efficient model for Cas13a guide activity prediction but also offers a foundation for future rational design efforts.

Abstract Image

基于特征选择和深度学习的CRISPR/Cas13a向导活性准确预测
CRISPR/Cas13a是核酸检测的关键工具;因此,准确预测其活性对于创建稳健和敏感的诊断至关重要。在本研究中,我们创建了一个双分支神经网络模型,该模型在两个独立的CRISPR/Cas13a数据集上实现了较高的预测精度和分类性能,优于先前发表的仅依赖序列特征的模型。该模型将直接序列编码与描述性特征相结合,并通过统计分析提取出1553个关键描述性特征中的99个,这些特征对导-靶相互作用和Cas13a导活性有重要影响。利用Shapley加性解释和积分梯度进行特征重要性分析,发现序列组成、失配类型和频率以及原间隔物侧翼区域是主要特征。这些发现强调了使用描述性特征作为基于深度学习的编码的补充输入的重要性,并为指导-目标交互的潜在机制提供了有价值的见解。总之,本研究不仅为Cas13a引导活性预测提供了一个可靠、高效的模型,也为今后的合理设计工作提供了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信