TEMC-Cas: Accurate Cas Protein Classification via Combined Contrastive Learning and Protein Language Models.

IF 3.9 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Xingyu Liao, Yanyan Li, Yingfu Wu, Long Wen, Minghui Jing, Bolin Chen, Xingyi Li, Xuequn Shang
{"title":"TEMC-Cas: Accurate Cas Protein Classification via Combined Contrastive Learning and Protein Language Models.","authors":"Xingyu Liao, Yanyan Li, Yingfu Wu, Long Wen, Minghui Jing, Bolin Chen, Xingyi Li, Xuequn Shang","doi":"10.1021/acssynbio.5c00631","DOIUrl":null,"url":null,"abstract":"<p><p>The accurate classification of Cas proteins is crucial for understanding CRISPR-Cas systems and developing genome-editing tools. Here, we present TEMC-Cas, a deep learning framework for accurate classification of Cas proteins that combines a finely tuned ESM protein language model with contrastive learning. Unlike traditional methods that rely on sequence similarity (e.g., BLAST, HMMs) or structural prediction, TEMC-Cas leverages evolutionary-scale modeling to capture distant homology while employing contrastive learning to distinguish closely related subtypes. The framework incorporates LoRA for efficient parameter adaptation and addresses class imbalance through weighted loss functions. TEMC-Cas achieves superior performance in classifying the Cas1-Cas13 families and 17 Cas12 subtypes, demonstrating particular strength in identifying remote homology. This approach provides a robust tool for the discovery of the CRISPR system and expands the toolbox for genome engineering applications. TEMC-Cas is now freely accessible at https://github.com/Xingyu-Liao/TEMC-Cas.</p>","PeriodicalId":26,"journal":{"name":"ACS Synthetic Biology","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Synthetic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1021/acssynbio.5c00631","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The accurate classification of Cas proteins is crucial for understanding CRISPR-Cas systems and developing genome-editing tools. Here, we present TEMC-Cas, a deep learning framework for accurate classification of Cas proteins that combines a finely tuned ESM protein language model with contrastive learning. Unlike traditional methods that rely on sequence similarity (e.g., BLAST, HMMs) or structural prediction, TEMC-Cas leverages evolutionary-scale modeling to capture distant homology while employing contrastive learning to distinguish closely related subtypes. The framework incorporates LoRA for efficient parameter adaptation and addresses class imbalance through weighted loss functions. TEMC-Cas achieves superior performance in classifying the Cas1-Cas13 families and 17 Cas12 subtypes, demonstrating particular strength in identifying remote homology. This approach provides a robust tool for the discovery of the CRISPR system and expands the toolbox for genome engineering applications. TEMC-Cas is now freely accessible at https://github.com/Xingyu-Liao/TEMC-Cas.

TEMC-Cas:结合对比学习和蛋白质语言模型的精确Cas蛋白分类。
Cas蛋白的准确分类对于理解CRISPR-Cas系统和开发基因组编辑工具至关重要。在这里,我们提出了TEMC-Cas,这是一个用于精确分类Cas蛋白的深度学习框架,它结合了精细调整的ESM蛋白语言模型和对比学习。与依赖序列相似性(例如BLAST, hmm)或结构预测的传统方法不同,TEMC-Cas利用进化尺度建模来捕获远同源性,同时采用对比学习来区分密切相关的亚型。该框架结合LoRA进行有效的参数自适应,并通过加权损失函数解决类不平衡问题。TEMC-Cas在分类Cas1-Cas13家族和17个Cas12亚型方面表现优异,在识别远程同源性方面表现出特殊的优势。这种方法为发现CRISPR系统提供了一个强大的工具,并扩展了基因组工程应用的工具箱。TEMC-Cas现在可以在https://github.com/Xingyu-Liao/TEMC-Cas上免费访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.00
自引率
10.60%
发文量
380
审稿时长
6-12 weeks
期刊介绍: The journal is particularly interested in studies on the design and synthesis of new genetic circuits and gene products; computational methods in the design of systems; and integrative applied approaches to understanding disease and metabolism. Topics may include, but are not limited to: Design and optimization of genetic systems Genetic circuit design and their principles for their organization into programs Computational methods to aid the design of genetic systems Experimental methods to quantify genetic parts, circuits, and metabolic fluxes Genetic parts libraries: their creation, analysis, and ontological representation Protein engineering including computational design Metabolic engineering and cellular manufacturing, including biomass conversion Natural product access, engineering, and production Creative and innovative applications of cellular programming Medical applications, tissue engineering, and the programming of therapeutic cells Minimal cell design and construction Genomics and genome replacement strategies Viral engineering Automated and robotic assembly platforms for synthetic biology DNA synthesis methodologies Metagenomics and synthetic metagenomic analysis Bioinformatics applied to gene discovery, chemoinformatics, and pathway construction Gene optimization Methods for genome-scale measurements of transcription and metabolomics Systems biology and methods to integrate multiple data sources in vitro and cell-free synthetic biology and molecular programming Nucleic acid engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信