Yuanjie Su, Chang Jiang, Ziyue Yang, Shisheng Sun, Junying Zhang
{"title":"Core fucose identification in glycoproteomics: an ML approach addressing fucose migration in mass spectrometry.","authors":"Yuanjie Su, Chang Jiang, Ziyue Yang, Shisheng Sun, Junying Zhang","doi":"10.1093/bioadv/vbaf186","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Core fucosylation is a common type of glycosylation that plays a significant role in biological functions. Accurate identification of core fucosylated glycopeptides is challenging due to fucose migration phenomenon during mass spectrometry. By using glycopeptides from mouse brain with FUT8 knocked out as cases and core-fucosylated high-mannose glycans in normal mouse brain as controls, the phenomena are widely observed from mass spectrometry data. The relative intensities of 10 core-related characteristic ions are used jointly as a feature vector, and a semisupervised model and a self-supervised model are developed in the feature space with robustness of the models studied.</p><p><strong>Results: </strong>Experimental results show that both models perform well, with the former superior to the latter, reaching 99.95% identification accuracy on an independent mouse brain data with FUT8 knocked out. By applying the models to wild-type mouse brain, human IgG and human serum, their dominant abundance of core fucose and/or noncore fucose are found, which is trustworthy since the effect of fucose migration is dealt with. The study highlights the great significance of trustworthy data labeling, well-defined features, and machine learning/deep learning techniques in highly reliable, accurate, and robust identification of core fucose from high-throughput mass spectrometry data.</p><p><strong>Availability and implementation: </strong>The code for core fucose identification is freely available in https://github.com/yzy-010203/core_focuse_identification.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf186"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448375/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Core fucosylation is a common type of glycosylation that plays a significant role in biological functions. Accurate identification of core fucosylated glycopeptides is challenging due to fucose migration phenomenon during mass spectrometry. By using glycopeptides from mouse brain with FUT8 knocked out as cases and core-fucosylated high-mannose glycans in normal mouse brain as controls, the phenomena are widely observed from mass spectrometry data. The relative intensities of 10 core-related characteristic ions are used jointly as a feature vector, and a semisupervised model and a self-supervised model are developed in the feature space with robustness of the models studied.
Results: Experimental results show that both models perform well, with the former superior to the latter, reaching 99.95% identification accuracy on an independent mouse brain data with FUT8 knocked out. By applying the models to wild-type mouse brain, human IgG and human serum, their dominant abundance of core fucose and/or noncore fucose are found, which is trustworthy since the effect of fucose migration is dealt with. The study highlights the great significance of trustworthy data labeling, well-defined features, and machine learning/deep learning techniques in highly reliable, accurate, and robust identification of core fucose from high-throughput mass spectrometry data.
Availability and implementation: The code for core fucose identification is freely available in https://github.com/yzy-010203/core_focuse_identification.