{"title":"TEMC-Cas: Accurate Cas Protein Classification via Combined Contrastive Learning and Protein Language Models.","authors":"Xingyu Liao, Yanyan Li, Yingfu Wu, Long Wen, Minghui Jing, Bolin Chen, Xingyi Li, Xuequn Shang","doi":"10.1021/acssynbio.5c00631","DOIUrl":null,"url":null,"abstract":"<p><p>The accurate classification of Cas proteins is crucial for understanding CRISPR-Cas systems and developing genome-editing tools. Here, we present TEMC-Cas, a deep learning framework for accurate classification of Cas proteins that combines a finely tuned ESM protein language model with contrastive learning. Unlike traditional methods that rely on sequence similarity (e.g., BLAST, HMMs) or structural prediction, TEMC-Cas leverages evolutionary-scale modeling to capture distant homology while employing contrastive learning to distinguish closely related subtypes. The framework incorporates LoRA for efficient parameter adaptation and addresses class imbalance through weighted loss functions. TEMC-Cas achieves superior performance in classifying the Cas1-Cas13 families and 17 Cas12 subtypes, demonstrating particular strength in identifying remote homology. This approach provides a robust tool for the discovery of the CRISPR system and expands the toolbox for genome engineering applications. TEMC-Cas is now freely accessible at https://github.com/Xingyu-Liao/TEMC-Cas.</p>","PeriodicalId":26,"journal":{"name":"ACS Synthetic Biology","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Synthetic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1021/acssynbio.5c00631","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The accurate classification of Cas proteins is crucial for understanding CRISPR-Cas systems and developing genome-editing tools. Here, we present TEMC-Cas, a deep learning framework for accurate classification of Cas proteins that combines a finely tuned ESM protein language model with contrastive learning. Unlike traditional methods that rely on sequence similarity (e.g., BLAST, HMMs) or structural prediction, TEMC-Cas leverages evolutionary-scale modeling to capture distant homology while employing contrastive learning to distinguish closely related subtypes. The framework incorporates LoRA for efficient parameter adaptation and addresses class imbalance through weighted loss functions. TEMC-Cas achieves superior performance in classifying the Cas1-Cas13 families and 17 Cas12 subtypes, demonstrating particular strength in identifying remote homology. This approach provides a robust tool for the discovery of the CRISPR system and expands the toolbox for genome engineering applications. TEMC-Cas is now freely accessible at https://github.com/Xingyu-Liao/TEMC-Cas.
期刊介绍:
The journal is particularly interested in studies on the design and synthesis of new genetic circuits and gene products; computational methods in the design of systems; and integrative applied approaches to understanding disease and metabolism.
Topics may include, but are not limited to:
Design and optimization of genetic systems
Genetic circuit design and their principles for their organization into programs
Computational methods to aid the design of genetic systems
Experimental methods to quantify genetic parts, circuits, and metabolic fluxes
Genetic parts libraries: their creation, analysis, and ontological representation
Protein engineering including computational design
Metabolic engineering and cellular manufacturing, including biomass conversion
Natural product access, engineering, and production
Creative and innovative applications of cellular programming
Medical applications, tissue engineering, and the programming of therapeutic cells
Minimal cell design and construction
Genomics and genome replacement strategies
Viral engineering
Automated and robotic assembly platforms for synthetic biology
DNA synthesis methodologies
Metagenomics and synthetic metagenomic analysis
Bioinformatics applied to gene discovery, chemoinformatics, and pathway construction
Gene optimization
Methods for genome-scale measurements of transcription and metabolomics
Systems biology and methods to integrate multiple data sources
in vitro and cell-free synthetic biology and molecular programming
Nucleic acid engineering.