Eleftherios Bochalis, Michail Patsakis, Nikol Chantzi, Ioannis Mouratidis, Dionysios V Chartoumpekis, Ilias Georgakopoulos-Soares
{"title":"分类准引物:绘制谱系特异性适应和疾病相关位点的肽。","authors":"Eleftherios Bochalis, Michail Patsakis, Nikol Chantzi, Ioannis Mouratidis, Dionysios V Chartoumpekis, Ilias Georgakopoulos-Soares","doi":"10.1002/pro.70241","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of succinct, universal fingerprints that enable the characterization of individual taxonomies can reveal insights into trait development. Here, we introduce taxonomic quasi-primes, peptide k-mer sequences that are exclusively present in a specific taxonomy and absent from all others. By analyzing 24,073 reference proteomes, we identified these unique peptides at the superkingdom, kingdom, and phylum ranks. These sequences exhibit remarkable uniqueness at six- and seven-amino-acid lengths. For instance, the seven-mer SAPNYCY is found in 98.11% of eukaryotic species, while being completely absent from archaeal, bacterial, and viral reference proteomes. Functional analysis demonstrated that taxonomic quasi-prime containing proteins are enriched for processes defining a lineage, such as synaptic signaling in Chordata. Structural analysis revealed that these peptides are preferentially located within proteins, participating directly in enzymatic active sites, mediating protein-protein interactions, and stabilizing ligand binding. Moreover, we show that in human proteins, highly conserved Chordata quasi-prime loci are 2.08-fold more likely to harbor pathogenic variants than surrounding regions, directly linking these evolutionary signatures to disease. This study establishes taxonomic quasi-primes as markers that illuminate evolutionary pathways and provide a powerful method for identifying functionally indispensable and disease-relevant loci, which warrant further therapeutic and diagnostic investigation.</p>","PeriodicalId":20761,"journal":{"name":"Protein Science","volume":"34 9","pages":"e70241"},"PeriodicalIF":5.2000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12375989/pdf/","citationCount":"0","resultStr":"{\"title\":\"Taxonomic quasi-primes: peptides charting lineage-specific adaptations and disease-relevant loci.\",\"authors\":\"Eleftherios Bochalis, Michail Patsakis, Nikol Chantzi, Ioannis Mouratidis, Dionysios V Chartoumpekis, Ilias Georgakopoulos-Soares\",\"doi\":\"10.1002/pro.70241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The identification of succinct, universal fingerprints that enable the characterization of individual taxonomies can reveal insights into trait development. Here, we introduce taxonomic quasi-primes, peptide k-mer sequences that are exclusively present in a specific taxonomy and absent from all others. By analyzing 24,073 reference proteomes, we identified these unique peptides at the superkingdom, kingdom, and phylum ranks. These sequences exhibit remarkable uniqueness at six- and seven-amino-acid lengths. For instance, the seven-mer SAPNYCY is found in 98.11% of eukaryotic species, while being completely absent from archaeal, bacterial, and viral reference proteomes. Functional analysis demonstrated that taxonomic quasi-prime containing proteins are enriched for processes defining a lineage, such as synaptic signaling in Chordata. Structural analysis revealed that these peptides are preferentially located within proteins, participating directly in enzymatic active sites, mediating protein-protein interactions, and stabilizing ligand binding. Moreover, we show that in human proteins, highly conserved Chordata quasi-prime loci are 2.08-fold more likely to harbor pathogenic variants than surrounding regions, directly linking these evolutionary signatures to disease. This study establishes taxonomic quasi-primes as markers that illuminate evolutionary pathways and provide a powerful method for identifying functionally indispensable and disease-relevant loci, which warrant further therapeutic and diagnostic investigation.</p>\",\"PeriodicalId\":20761,\"journal\":{\"name\":\"Protein Science\",\"volume\":\"34 9\",\"pages\":\"e70241\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12375989/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Protein Science\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/pro.70241\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Protein Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pro.70241","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Taxonomic quasi-primes: peptides charting lineage-specific adaptations and disease-relevant loci.
The identification of succinct, universal fingerprints that enable the characterization of individual taxonomies can reveal insights into trait development. Here, we introduce taxonomic quasi-primes, peptide k-mer sequences that are exclusively present in a specific taxonomy and absent from all others. By analyzing 24,073 reference proteomes, we identified these unique peptides at the superkingdom, kingdom, and phylum ranks. These sequences exhibit remarkable uniqueness at six- and seven-amino-acid lengths. For instance, the seven-mer SAPNYCY is found in 98.11% of eukaryotic species, while being completely absent from archaeal, bacterial, and viral reference proteomes. Functional analysis demonstrated that taxonomic quasi-prime containing proteins are enriched for processes defining a lineage, such as synaptic signaling in Chordata. Structural analysis revealed that these peptides are preferentially located within proteins, participating directly in enzymatic active sites, mediating protein-protein interactions, and stabilizing ligand binding. Moreover, we show that in human proteins, highly conserved Chordata quasi-prime loci are 2.08-fold more likely to harbor pathogenic variants than surrounding regions, directly linking these evolutionary signatures to disease. This study establishes taxonomic quasi-primes as markers that illuminate evolutionary pathways and provide a powerful method for identifying functionally indispensable and disease-relevant loci, which warrant further therapeutic and diagnostic investigation.
期刊介绍:
Protein Science, the flagship journal of The Protein Society, is a publication that focuses on advancing fundamental knowledge in the field of protein molecules. The journal welcomes original reports and review articles that contribute to our understanding of protein function, structure, folding, design, and evolution.
Additionally, Protein Science encourages papers that explore the applications of protein science in various areas such as therapeutics, protein-based biomaterials, bionanotechnology, synthetic biology, and bioelectronics.
The journal accepts manuscript submissions in any suitable format for review, with the requirement of converting the manuscript to journal-style format only upon acceptance for publication.
Protein Science is indexed and abstracted in numerous databases, including the Agricultural & Environmental Science Database (ProQuest), Biological Science Database (ProQuest), CAS: Chemical Abstracts Service (ACS), Embase (Elsevier), Health & Medical Collection (ProQuest), Health Research Premium Collection (ProQuest), Materials Science & Engineering Database (ProQuest), MEDLINE/PubMed (NLM), Natural Science Collection (ProQuest), and SciTech Premium Collection (ProQuest).