{"title":"ZooMS胶原肽质量指纹图谱的序列驱动物种鉴定","authors":"Toby Lawrence , Michael Buckley","doi":"10.1016/j.jprot.2025.105525","DOIUrl":null,"url":null,"abstract":"<div><div>Developments in biomolecular species identification of animal tissues have been ongoing for decades, with collagen peptide mass fingerprinting becoming increasingly used in recent years. However, establishing confidence in the species biomarkers within these fingerprints requires sequence assignment, usually done via LC-ESI-MS/MS-based approaches and correlation with sequence databases. This study develops an approach that allows collagen fingerprints to be matched to sequence databases directly. To do so we create theoretical spectra from <em>in silico</em> digests of publicly available sequences that are then filtered by previously collected proteomic sequence data. These inform on the likely number of collagen post translational modifications, vastly reducing the number of peaks in the theoretical spectra and so making overlapping peptide signals as well as false positives less likely. We retrieved a database containing 211 mammals and tested this approach with spectra of 29 modern reference species and 98 archaeological examples of 10 different families, some for which the taxa were represented in the sequence database, and others that were not. This approach was found to be at least 93 % accurate for predicting the correct family in both modern and archaeological spectra, and capable of species-level identification in some cases. This sequence-driven analysis allows rapid comparison across whole spectra, rather than small sets of markers for a particular taxon, which removes human error from manual identification and ensures that the selected markers derive from the protein of interest, unlike machine-learning methods.</div></div><div><h3>Significance</h3><div>Species identification using collagen peptide mass fingerprinting is a MALDI-based mass spectrometric method becoming increasingly popular, largely because of its reliance on the dominant protein in the most enduring of biological tissues, bone and tooth dentine. This endurance has great significance to fields such as bioarchaeology and palaeontology, but also applies to processed foodstuffs, for which proteomics-based species identification has been ongoing for decades. However, with this increase in demand, there are greater explorations into a wider range of vertebrate species under investigation, making biomarker selection more challenging. Although the use of DNA-based gene sequence information has been a cornerstone for probability-based proteomic inferences, their use in fingerprint analysis for species identification has remained indirect, where tools such as Mascot's PMF search application may be suitable for protein identification but often struggle with such species-level inferences. Here we introduce a means to create post-translational modification rules based on observation in LC-MS/MS data to generate improved <em>in silico</em> PMFs from DNA-based sequences that greatly reduces search space and confidence matches to taxonomic interpretations of PMFs. This is applicable beyond collagen to any protein for which species identification is needed.</div></div>","PeriodicalId":16891,"journal":{"name":"Journal of proteomics","volume":"321 ","pages":"Article 105525"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sequence-driven species identification of ZooMS collagen peptide mass fingerprints\",\"authors\":\"Toby Lawrence , Michael Buckley\",\"doi\":\"10.1016/j.jprot.2025.105525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Developments in biomolecular species identification of animal tissues have been ongoing for decades, with collagen peptide mass fingerprinting becoming increasingly used in recent years. However, establishing confidence in the species biomarkers within these fingerprints requires sequence assignment, usually done via LC-ESI-MS/MS-based approaches and correlation with sequence databases. This study develops an approach that allows collagen fingerprints to be matched to sequence databases directly. To do so we create theoretical spectra from <em>in silico</em> digests of publicly available sequences that are then filtered by previously collected proteomic sequence data. These inform on the likely number of collagen post translational modifications, vastly reducing the number of peaks in the theoretical spectra and so making overlapping peptide signals as well as false positives less likely. We retrieved a database containing 211 mammals and tested this approach with spectra of 29 modern reference species and 98 archaeological examples of 10 different families, some for which the taxa were represented in the sequence database, and others that were not. This approach was found to be at least 93 % accurate for predicting the correct family in both modern and archaeological spectra, and capable of species-level identification in some cases. This sequence-driven analysis allows rapid comparison across whole spectra, rather than small sets of markers for a particular taxon, which removes human error from manual identification and ensures that the selected markers derive from the protein of interest, unlike machine-learning methods.</div></div><div><h3>Significance</h3><div>Species identification using collagen peptide mass fingerprinting is a MALDI-based mass spectrometric method becoming increasingly popular, largely because of its reliance on the dominant protein in the most enduring of biological tissues, bone and tooth dentine. This endurance has great significance to fields such as bioarchaeology and palaeontology, but also applies to processed foodstuffs, for which proteomics-based species identification has been ongoing for decades. However, with this increase in demand, there are greater explorations into a wider range of vertebrate species under investigation, making biomarker selection more challenging. Although the use of DNA-based gene sequence information has been a cornerstone for probability-based proteomic inferences, their use in fingerprint analysis for species identification has remained indirect, where tools such as Mascot's PMF search application may be suitable for protein identification but often struggle with such species-level inferences. Here we introduce a means to create post-translational modification rules based on observation in LC-MS/MS data to generate improved <em>in silico</em> PMFs from DNA-based sequences that greatly reduces search space and confidence matches to taxonomic interpretations of PMFs. This is applicable beyond collagen to any protein for which species identification is needed.</div></div>\",\"PeriodicalId\":16891,\"journal\":{\"name\":\"Journal of proteomics\",\"volume\":\"321 \",\"pages\":\"Article 105525\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of proteomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1874391925001526\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of proteomics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874391925001526","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Sequence-driven species identification of ZooMS collagen peptide mass fingerprints
Developments in biomolecular species identification of animal tissues have been ongoing for decades, with collagen peptide mass fingerprinting becoming increasingly used in recent years. However, establishing confidence in the species biomarkers within these fingerprints requires sequence assignment, usually done via LC-ESI-MS/MS-based approaches and correlation with sequence databases. This study develops an approach that allows collagen fingerprints to be matched to sequence databases directly. To do so we create theoretical spectra from in silico digests of publicly available sequences that are then filtered by previously collected proteomic sequence data. These inform on the likely number of collagen post translational modifications, vastly reducing the number of peaks in the theoretical spectra and so making overlapping peptide signals as well as false positives less likely. We retrieved a database containing 211 mammals and tested this approach with spectra of 29 modern reference species and 98 archaeological examples of 10 different families, some for which the taxa were represented in the sequence database, and others that were not. This approach was found to be at least 93 % accurate for predicting the correct family in both modern and archaeological spectra, and capable of species-level identification in some cases. This sequence-driven analysis allows rapid comparison across whole spectra, rather than small sets of markers for a particular taxon, which removes human error from manual identification and ensures that the selected markers derive from the protein of interest, unlike machine-learning methods.
Significance
Species identification using collagen peptide mass fingerprinting is a MALDI-based mass spectrometric method becoming increasingly popular, largely because of its reliance on the dominant protein in the most enduring of biological tissues, bone and tooth dentine. This endurance has great significance to fields such as bioarchaeology and palaeontology, but also applies to processed foodstuffs, for which proteomics-based species identification has been ongoing for decades. However, with this increase in demand, there are greater explorations into a wider range of vertebrate species under investigation, making biomarker selection more challenging. Although the use of DNA-based gene sequence information has been a cornerstone for probability-based proteomic inferences, their use in fingerprint analysis for species identification has remained indirect, where tools such as Mascot's PMF search application may be suitable for protein identification but often struggle with such species-level inferences. Here we introduce a means to create post-translational modification rules based on observation in LC-MS/MS data to generate improved in silico PMFs from DNA-based sequences that greatly reduces search space and confidence matches to taxonomic interpretations of PMFs. This is applicable beyond collagen to any protein for which species identification is needed.
期刊介绍:
Journal of Proteomics is aimed at protein scientists and analytical chemists in the field of proteomics, biomarker discovery, protein analytics, plant proteomics, microbial and animal proteomics, human studies, tissue imaging by mass spectrometry, non-conventional and non-model organism proteomics, and protein bioinformatics. The journal welcomes papers in new and upcoming areas such as metabolomics, genomics, systems biology, toxicogenomics, pharmacoproteomics.
Journal of Proteomics unifies both fundamental scientists and clinicians, and includes translational research. Suggestions for reviews, webinars and thematic issues are welcome.