ZooMS胶原肽质量指纹图谱的序列驱动物种鉴定

IF 2.8 2区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

Journal of proteomics Pub Date : 2025-08-22 DOI:10.1016/j.jprot.2025.105525

Toby Lawrence , Michael Buckley

{"title":"ZooMS胶原肽质量指纹图谱的序列驱动物种鉴定","authors":"Toby Lawrence , Michael Buckley","doi":"10.1016/j.jprot.2025.105525","DOIUrl":null,"url":null,"abstract":"<div><div>Developments in biomolecular species identification of animal tissues have been ongoing for decades, with collagen peptide mass fingerprinting becoming increasingly used in recent years. However, establishing confidence in the species biomarkers within these fingerprints requires sequence assignment, usually done via LC-ESI-MS/MS-based approaches and correlation with sequence databases. This study develops an approach that allows collagen fingerprints to be matched to sequence databases directly. To do so we create theoretical spectra from <em>in silico</em> digests of publicly available sequences that are then filtered by previously collected proteomic sequence data. These inform on the likely number of collagen post translational modifications, vastly reducing the number of peaks in the theoretical spectra and so making overlapping peptide signals as well as false positives less likely. We retrieved a database containing 211 mammals and tested this approach with spectra of 29 modern reference species and 98 archaeological examples of 10 different families, some for which the taxa were represented in the sequence database, and others that were not. This approach was found to be at least 93 % accurate for predicting the correct family in both modern and archaeological spectra, and capable of species-level identification in some cases. This sequence-driven analysis allows rapid comparison across whole spectra, rather than small sets of markers for a particular taxon, which removes human error from manual identification and ensures that the selected markers derive from the protein of interest, unlike machine-learning methods.</div></div><div><h3>Significance</h3><div>Species identification using collagen peptide mass fingerprinting is a MALDI-based mass spectrometric method becoming increasingly popular, largely because of its reliance on the dominant protein in the most enduring of biological tissues, bone and tooth dentine. This endurance has great significance to fields such as bioarchaeology and palaeontology, but also applies to processed foodstuffs, for which proteomics-based species identification has been ongoing for decades. However, with this increase in demand, there are greater explorations into a wider range of vertebrate species under investigation, making biomarker selection more challenging. Although the use of DNA-based gene sequence information has been a cornerstone for probability-based proteomic inferences, their use in fingerprint analysis for species identification has remained indirect, where tools such as Mascot's PMF search application may be suitable for protein identification but often struggle with such species-level inferences. Here we introduce a means to create post-translational modification rules based on observation in LC-MS/MS data to generate improved <em>in silico</em> PMFs from DNA-based sequences that greatly reduces search space and confidence matches to taxonomic interpretations of PMFs. This is applicable beyond collagen to any protein for which species identification is needed.</div></div>","PeriodicalId":16891,"journal":{"name":"Journal of proteomics","volume":"321 ","pages":"Article 105525"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sequence-driven species identification of ZooMS collagen peptide mass fingerprints\",\"authors\":\"Toby Lawrence , Michael Buckley\",\"doi\":\"10.1016/j.jprot.2025.105525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Developments in biomolecular species identification of animal tissues have been ongoing for decades, with collagen peptide mass fingerprinting becoming increasingly used in recent years. However, establishing confidence in the species biomarkers within these fingerprints requires sequence assignment, usually done via LC-ESI-MS/MS-based approaches and correlation with sequence databases. This study develops an approach that allows collagen fingerprints to be matched to sequence databases directly. To do so we create theoretical spectra from <em>in silico</em> digests of publicly available sequences that are then filtered by previously collected proteomic sequence data. These inform on the likely number of collagen post translational modifications, vastly reducing the number of peaks in the theoretical spectra and so making overlapping peptide signals as well as false positives less likely. We retrieved a database containing 211 mammals and tested this approach with spectra of 29 modern reference species and 98 archaeological examples of 10 different families, some for which the taxa were represented in the sequence database, and others that were not. This approach was found to be at least 93 % accurate for predicting the correct family in both modern and archaeological spectra, and capable of species-level identification in some cases. This sequence-driven analysis allows rapid comparison across whole spectra, rather than small sets of markers for a particular taxon, which removes human error from manual identification and ensures that the selected markers derive from the protein of interest, unlike machine-learning methods.</div></div><div><h3>Significance</h3><div>Species identification using collagen peptide mass fingerprinting is a MALDI-based mass spectrometric method becoming increasingly popular, largely because of its reliance on the dominant protein in the most enduring of biological tissues, bone and tooth dentine. This endurance has great significance to fields such as bioarchaeology and palaeontology, but also applies to processed foodstuffs, for which proteomics-based species identification has been ongoing for decades. However, with this increase in demand, there are greater explorations into a wider range of vertebrate species under investigation, making biomarker selection more challenging. Although the use of DNA-based gene sequence information has been a cornerstone for probability-based proteomic inferences, their use in fingerprint analysis for species identification has remained indirect, where tools such as Mascot's PMF search application may be suitable for protein identification but often struggle with such species-level inferences. Here we introduce a means to create post-translational modification rules based on observation in LC-MS/MS data to generate improved <em>in silico</em> PMFs from DNA-based sequences that greatly reduces search space and confidence matches to taxonomic interpretations of PMFs. This is applicable beyond collagen to any protein for which species identification is needed.</div></div>\",\"PeriodicalId\":16891,\"journal\":{\"name\":\"Journal of proteomics\",\"volume\":\"321 \",\"pages\":\"Article 105525\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of proteomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1874391925001526\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of proteomics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874391925001526","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

动物组织的生物分子物种鉴定已经进行了几十年的发展，胶原肽质量指纹技术近年来得到越来越多的应用。然而，在这些指纹中建立物种生物标志物的可信度需要序列分配，通常通过LC-ESI-MS/MS-based方法和与序列数据库的关联来完成。本研究开发了一种方法，允许胶原指纹直接匹配序列数据库。为此，我们从公开可用序列的计算机摘要中创建理论光谱，然后通过先前收集的蛋白质组学序列数据进行过滤。这些信息提供了胶原蛋白翻译后修饰的可能数量，大大减少了理论光谱中的峰的数量，从而使重叠的肽信号以及假阳性的可能性降低。我们检索了包含211种哺乳动物的数据库，并对29种现代参考物种和10个不同科的98个考古样本的光谱进行了测试，其中一些分类群在序列数据库中有代表，而另一些则没有。研究发现，这种方法在预测现代和考古光谱中正确的科的准确率至少为93%，并且在某些情况下能够进行物种水平的鉴定。这种序列驱动的分析允许对整个光谱进行快速比较，而不是对特定分类单元的小组标记，这消除了人工识别中的人为错误，并确保所选标记来自感兴趣的蛋白质，这与机器学习方法不同。使用胶原肽质量指纹图谱进行物种鉴定是一种基于maldi的质谱方法，越来越受欢迎，主要是因为它依赖于最持久的生物组织，骨和牙本质中的优势蛋白。这种耐久性对生物考古学和古生物学等领域具有重要意义，但也适用于加工食品，基于蛋白质组学的物种鉴定已经进行了数十年。然而，随着需求的增加，人们对更广泛的脊椎动物物种进行了更大的探索，这使得生物标志物的选择更具挑战性。尽管基于dna的基因序列信息的使用已经成为基于概率的蛋白质组学推断的基石，但它们在物种鉴定的指纹分析中的应用仍然是间接的，其中像Mascot的PMF搜索应用程序这样的工具可能适用于蛋白质鉴定，但往往难以进行这种物种水平的推断。本文介绍了一种基于LC-MS/MS数据观察创建翻译后修饰规则的方法，以从基于dna的序列中生成改进的硅基pmf，从而大大减少了pmf分类解释的搜索空间和置信度匹配。这不仅适用于胶原蛋白，也适用于任何需要进行物种鉴定的蛋白质。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Sequence-driven species identification of ZooMS collagen peptide mass fingerprints

查看原文本刊更多论文

Sequence-driven species identification of ZooMS collagen peptide mass fingerprints

Developments in biomolecular species identification of animal tissues have been ongoing for decades, with collagen peptide mass fingerprinting becoming increasingly used in recent years. However, establishing confidence in the species biomarkers within these fingerprints requires sequence assignment, usually done via LC-ESI-MS/MS-based approaches and correlation with sequence databases. This study develops an approach that allows collagen fingerprints to be matched to sequence databases directly. To do so we create theoretical spectra from in silico digests of publicly available sequences that are then filtered by previously collected proteomic sequence data. These inform on the likely number of collagen post translational modifications, vastly reducing the number of peaks in the theoretical spectra and so making overlapping peptide signals as well as false positives less likely. We retrieved a database containing 211 mammals and tested this approach with spectra of 29 modern reference species and 98 archaeological examples of 10 different families, some for which the taxa were represented in the sequence database, and others that were not. This approach was found to be at least 93 % accurate for predicting the correct family in both modern and archaeological spectra, and capable of species-level identification in some cases. This sequence-driven analysis allows rapid comparison across whole spectra, rather than small sets of markers for a particular taxon, which removes human error from manual identification and ensures that the selected markers derive from the protein of interest, unlike machine-learning methods.

Significance

Species identification using collagen peptide mass fingerprinting is a MALDI-based mass spectrometric method becoming increasingly popular, largely because of its reliance on the dominant protein in the most enduring of biological tissues, bone and tooth dentine. This endurance has great significance to fields such as bioarchaeology and palaeontology, but also applies to processed foodstuffs, for which proteomics-based species identification has been ongoing for decades. However, with this increase in demand, there are greater explorations into a wider range of vertebrate species under investigation, making biomarker selection more challenging. Although the use of DNA-based gene sequence information has been a cornerstone for probability-based proteomic inferences, their use in fingerprint analysis for species identification has remained indirect, where tools such as Mascot's PMF search application may be suitable for protein identification but often struggle with such species-level inferences. Here we introduce a means to create post-translational modification rules based on observation in LC-MS/MS data to generate improved in silico PMFs from DNA-based sequences that greatly reduces search space and confidence matches to taxonomic interpretations of PMFs. This is applicable beyond collagen to any protein for which species identification is needed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of proteomics 生物-生化研究方法

CiteScore

7.10

自引率

3.00%

发文量

227

审稿时长

73 days

期刊介绍： Journal of Proteomics is aimed at protein scientists and analytical chemists in the field of proteomics, biomarker discovery, protein analytics, plant proteomics, microbial and animal proteomics, human studies, tissue imaging by mass spectrometry, non-conventional and non-model organism proteomics, and protein bioinformatics. The journal welcomes papers in new and upcoming areas such as metabolomics, genomics, systems biology, toxicogenomics, pharmacoproteomics. Journal of Proteomics unifies both fundamental scientists and clinicians, and includes translational research. Suggestions for reviews, webinars and thematic issues are welcome.