Kunyi Li, Baozhen Shan, Lei Xin, Ming Li, Lusheng Wang
{"title":"自顶向下质谱法在蛋白质数据库中搜索蛋白质形态。","authors":"Kunyi Li, Baozhen Shan, Lei Xin, Ming Li, Lusheng Wang","doi":"10.1038/s43588-025-00880-z","DOIUrl":null,"url":null,"abstract":"<p><p>Here we propose a search algorithm for proteoform identification that computes the largest-size error-correction alignments between a protein mass graph and a spectrum mass graph. Our combined method uses a filtering algorithm to identify candidates and then applies a search algorithm to report the final results. Our exact searching method is 3.9 to 9.0 times faster than popular methods such as TopMG and TopPIC. Our combined method can further speed-up the running time of sTopMG without affecting the search accuracy. We develop a pipeline for generating simulated top-down spectra on the basis of input protein sequences with modifications. Experiments on simulated datasets show that our combined method has 95% accuracy, which exceeds existing methods. Experiments on real annotated datasets show that our method has ≥97.1% accuracy using deconvolution method FLASHDeconv.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":18.3000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Proteoform search from protein database with top-down mass spectra.\",\"authors\":\"Kunyi Li, Baozhen Shan, Lei Xin, Ming Li, Lusheng Wang\",\"doi\":\"10.1038/s43588-025-00880-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Here we propose a search algorithm for proteoform identification that computes the largest-size error-correction alignments between a protein mass graph and a spectrum mass graph. Our combined method uses a filtering algorithm to identify candidates and then applies a search algorithm to report the final results. Our exact searching method is 3.9 to 9.0 times faster than popular methods such as TopMG and TopPIC. Our combined method can further speed-up the running time of sTopMG without affecting the search accuracy. We develop a pipeline for generating simulated top-down spectra on the basis of input protein sequences with modifications. Experiments on simulated datasets show that our combined method has 95% accuracy, which exceeds existing methods. Experiments on real annotated datasets show that our method has ≥97.1% accuracy using deconvolution method FLASHDeconv.</p>\",\"PeriodicalId\":74246,\"journal\":{\"name\":\"Nature computational science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":18.3000,\"publicationDate\":\"2025-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature computational science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1038/s43588-025-00880-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43588-025-00880-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Proteoform search from protein database with top-down mass spectra.
Here we propose a search algorithm for proteoform identification that computes the largest-size error-correction alignments between a protein mass graph and a spectrum mass graph. Our combined method uses a filtering algorithm to identify candidates and then applies a search algorithm to report the final results. Our exact searching method is 3.9 to 9.0 times faster than popular methods such as TopMG and TopPIC. Our combined method can further speed-up the running time of sTopMG without affecting the search accuracy. We develop a pipeline for generating simulated top-down spectra on the basis of input protein sequences with modifications. Experiments on simulated datasets show that our combined method has 95% accuracy, which exceeds existing methods. Experiments on real annotated datasets show that our method has ≥97.1% accuracy using deconvolution method FLASHDeconv.