分类准引物:绘制谱系特异性适应和疾病相关位点的肽。

IF 5.2 3区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Protein Science Pub Date : 2025-09-01 DOI:10.1002/pro.70241
Eleftherios Bochalis, Michail Patsakis, Nikol Chantzi, Ioannis Mouratidis, Dionysios V Chartoumpekis, Ilias Georgakopoulos-Soares
{"title":"分类准引物:绘制谱系特异性适应和疾病相关位点的肽。","authors":"Eleftherios Bochalis, Michail Patsakis, Nikol Chantzi, Ioannis Mouratidis, Dionysios V Chartoumpekis, Ilias Georgakopoulos-Soares","doi":"10.1002/pro.70241","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of succinct, universal fingerprints that enable the characterization of individual taxonomies can reveal insights into trait development. Here, we introduce taxonomic quasi-primes, peptide k-mer sequences that are exclusively present in a specific taxonomy and absent from all others. By analyzing 24,073 reference proteomes, we identified these unique peptides at the superkingdom, kingdom, and phylum ranks. These sequences exhibit remarkable uniqueness at six- and seven-amino-acid lengths. For instance, the seven-mer SAPNYCY is found in 98.11% of eukaryotic species, while being completely absent from archaeal, bacterial, and viral reference proteomes. Functional analysis demonstrated that taxonomic quasi-prime containing proteins are enriched for processes defining a lineage, such as synaptic signaling in Chordata. Structural analysis revealed that these peptides are preferentially located within proteins, participating directly in enzymatic active sites, mediating protein-protein interactions, and stabilizing ligand binding. Moreover, we show that in human proteins, highly conserved Chordata quasi-prime loci are 2.08-fold more likely to harbor pathogenic variants than surrounding regions, directly linking these evolutionary signatures to disease. This study establishes taxonomic quasi-primes as markers that illuminate evolutionary pathways and provide a powerful method for identifying functionally indispensable and disease-relevant loci, which warrant further therapeutic and diagnostic investigation.</p>","PeriodicalId":20761,"journal":{"name":"Protein Science","volume":"34 9","pages":"e70241"},"PeriodicalIF":5.2000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12375989/pdf/","citationCount":"0","resultStr":"{\"title\":\"Taxonomic quasi-primes: peptides charting lineage-specific adaptations and disease-relevant loci.\",\"authors\":\"Eleftherios Bochalis, Michail Patsakis, Nikol Chantzi, Ioannis Mouratidis, Dionysios V Chartoumpekis, Ilias Georgakopoulos-Soares\",\"doi\":\"10.1002/pro.70241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The identification of succinct, universal fingerprints that enable the characterization of individual taxonomies can reveal insights into trait development. Here, we introduce taxonomic quasi-primes, peptide k-mer sequences that are exclusively present in a specific taxonomy and absent from all others. By analyzing 24,073 reference proteomes, we identified these unique peptides at the superkingdom, kingdom, and phylum ranks. These sequences exhibit remarkable uniqueness at six- and seven-amino-acid lengths. For instance, the seven-mer SAPNYCY is found in 98.11% of eukaryotic species, while being completely absent from archaeal, bacterial, and viral reference proteomes. Functional analysis demonstrated that taxonomic quasi-prime containing proteins are enriched for processes defining a lineage, such as synaptic signaling in Chordata. Structural analysis revealed that these peptides are preferentially located within proteins, participating directly in enzymatic active sites, mediating protein-protein interactions, and stabilizing ligand binding. Moreover, we show that in human proteins, highly conserved Chordata quasi-prime loci are 2.08-fold more likely to harbor pathogenic variants than surrounding regions, directly linking these evolutionary signatures to disease. This study establishes taxonomic quasi-primes as markers that illuminate evolutionary pathways and provide a powerful method for identifying functionally indispensable and disease-relevant loci, which warrant further therapeutic and diagnostic investigation.</p>\",\"PeriodicalId\":20761,\"journal\":{\"name\":\"Protein Science\",\"volume\":\"34 9\",\"pages\":\"e70241\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12375989/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Protein Science\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/pro.70241\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Protein Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pro.70241","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

识别简洁、通用的指纹,使个体分类的表征能够揭示特征发展的见解。在这里,我们介绍分类准引物,肽k-mer序列,只存在于一个特定的分类和缺席所有其他。通过分析24,073个参考蛋白质组,我们在超界、界和门级别上鉴定了这些独特的肽。这些序列在6和7个氨基酸的长度上表现出显著的独特性。例如,七聚体SAPNYCY存在于98.11%的真核生物物种中,而在古细菌、细菌和病毒参考蛋白质组中完全不存在。功能分析表明,类群上含有的准prime蛋白丰富于定义谱系的过程,如脊索动物的突触信号传导。结构分析表明,这些肽优先位于蛋白质内,直接参与酶活性位点,介导蛋白质相互作用,稳定配体结合。此外,我们表明,在人类蛋白质中,高度保守的脊索类准prime位点比周围区域携带致病变异的可能性高2.08倍,直接将这些进化特征与疾病联系起来。本研究建立了分类上的准引物作为标记,阐明了进化途径,并为识别功能上不可或缺的和疾病相关的位点提供了有力的方法,这为进一步的治疗和诊断研究提供了依据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Taxonomic quasi-primes: peptides charting lineage-specific adaptations and disease-relevant loci.

The identification of succinct, universal fingerprints that enable the characterization of individual taxonomies can reveal insights into trait development. Here, we introduce taxonomic quasi-primes, peptide k-mer sequences that are exclusively present in a specific taxonomy and absent from all others. By analyzing 24,073 reference proteomes, we identified these unique peptides at the superkingdom, kingdom, and phylum ranks. These sequences exhibit remarkable uniqueness at six- and seven-amino-acid lengths. For instance, the seven-mer SAPNYCY is found in 98.11% of eukaryotic species, while being completely absent from archaeal, bacterial, and viral reference proteomes. Functional analysis demonstrated that taxonomic quasi-prime containing proteins are enriched for processes defining a lineage, such as synaptic signaling in Chordata. Structural analysis revealed that these peptides are preferentially located within proteins, participating directly in enzymatic active sites, mediating protein-protein interactions, and stabilizing ligand binding. Moreover, we show that in human proteins, highly conserved Chordata quasi-prime loci are 2.08-fold more likely to harbor pathogenic variants than surrounding regions, directly linking these evolutionary signatures to disease. This study establishes taxonomic quasi-primes as markers that illuminate evolutionary pathways and provide a powerful method for identifying functionally indispensable and disease-relevant loci, which warrant further therapeutic and diagnostic investigation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Protein Science
Protein Science 生物-生化与分子生物学
CiteScore
12.40
自引率
1.20%
发文量
246
审稿时长
1 months
期刊介绍: Protein Science, the flagship journal of The Protein Society, is a publication that focuses on advancing fundamental knowledge in the field of protein molecules. The journal welcomes original reports and review articles that contribute to our understanding of protein function, structure, folding, design, and evolution. Additionally, Protein Science encourages papers that explore the applications of protein science in various areas such as therapeutics, protein-based biomaterials, bionanotechnology, synthetic biology, and bioelectronics. The journal accepts manuscript submissions in any suitable format for review, with the requirement of converting the manuscript to journal-style format only upon acceptance for publication. Protein Science is indexed and abstracted in numerous databases, including the Agricultural & Environmental Science Database (ProQuest), Biological Science Database (ProQuest), CAS: Chemical Abstracts Service (ACS), Embase (Elsevier), Health & Medical Collection (ProQuest), Health Research Premium Collection (ProQuest), Materials Science & Engineering Database (ProQuest), MEDLINE/PubMed (NLM), Natural Science Collection (ProQuest), and SciTech Premium Collection (ProQuest).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信