Michael Russelle S. Alvarez, Xavier A. Holmes, Armin Oloumi, Sheryl Joyce Grijaldo-Alvarez, Ryan Schindler, Qingwen Zhou, Anirudh Yadlapati, Atit Silsirivanit, Carlito B. Lebrilla
{"title":"Integration of RNAseq transcriptomics and N-glycomics reveal biosynthetic pathways and predict structure-specific N-glycan expression","authors":"Michael Russelle S. Alvarez, Xavier A. Holmes, Armin Oloumi, Sheryl Joyce Grijaldo-Alvarez, Ryan Schindler, Qingwen Zhou, Anirudh Yadlapati, Atit Silsirivanit, Carlito B. Lebrilla","doi":"10.1039/d5sc00467e","DOIUrl":null,"url":null,"abstract":"The processes involved in protein <em>N</em>-glycosylation represent new therapeutic targets for diseases but their stepwise and overlapping biosynthetic processes make it challenging to identify the specific glycogenes involved. In this work, we aimed to elucidate the interactions between glycogene expression and <em>N</em>-glycan abundance by constructing supervised machine-learning models for each <em>N</em>-glycan composition. Regression models were trained to predict <em>N</em>-glycan abundance (response variable) from glycogene expression (predictors) using paired LC-MS/MS <em>N</em>-glycomic and 3′-TagSeq transcriptomic datasets from cells derived from multiple tissue origins and treatment conditions. The datasets include cells from several tissue origins – B cell, brain, colon, lung, muscle, prostate – encompassing nearly 400 <em>N</em>-glycan compounds and over 160 glycogenes filtered from an 18 000-gene transcriptome. Accurate models (validation <em>R</em><small><sup>2</sup></small> > 0.8) predicted <em>N</em>-glycan abundance across cell types, including GLC01 (lung cancer), CCD19-Lu (lung fibroblast), and Tib-190 (B cell). Model importance scores ranked glycogene contributions to <em>N</em>-glycan predictions, revealing significant glycogene associations with specific <em>N</em>-glycan types. The predictions were consistent across input cell quantities, unlike LC-MS/MS glycomics which showed inconsistent results. This suggests that the models can reliably predict <em>N</em>-glycosylation even in samples with low cell amounts and by extension, single-cell samples. These findings can provide insights into cellular <em>N</em>-glycosylation machinery, offering potential therapeutic strategies for diseases linked to aberrant glycosylation, such as cancer, and neurodegenerative and autoimmune disorders.","PeriodicalId":9909,"journal":{"name":"Chemical Science","volume":"183 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Science","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d5sc00467e","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Integration of RNAseq transcriptomics and N-glycomics reveal biosynthetic pathways and predict structure-specific N-glycan expression
The processes involved in protein N-glycosylation represent new therapeutic targets for diseases but their stepwise and overlapping biosynthetic processes make it challenging to identify the specific glycogenes involved. In this work, we aimed to elucidate the interactions between glycogene expression and N-glycan abundance by constructing supervised machine-learning models for each N-glycan composition. Regression models were trained to predict N-glycan abundance (response variable) from glycogene expression (predictors) using paired LC-MS/MS N-glycomic and 3′-TagSeq transcriptomic datasets from cells derived from multiple tissue origins and treatment conditions. The datasets include cells from several tissue origins – B cell, brain, colon, lung, muscle, prostate – encompassing nearly 400 N-glycan compounds and over 160 glycogenes filtered from an 18 000-gene transcriptome. Accurate models (validation R2 > 0.8) predicted N-glycan abundance across cell types, including GLC01 (lung cancer), CCD19-Lu (lung fibroblast), and Tib-190 (B cell). Model importance scores ranked glycogene contributions to N-glycan predictions, revealing significant glycogene associations with specific N-glycan types. The predictions were consistent across input cell quantities, unlike LC-MS/MS glycomics which showed inconsistent results. This suggests that the models can reliably predict N-glycosylation even in samples with low cell amounts and by extension, single-cell samples. These findings can provide insights into cellular N-glycosylation machinery, offering potential therapeutic strategies for diseases linked to aberrant glycosylation, such as cancer, and neurodegenerative and autoimmune disorders.
期刊介绍:
Chemical Science is a journal that encompasses various disciplines within the chemical sciences. Its scope includes publishing ground-breaking research with significant implications for its respective field, as well as appealing to a wider audience in related areas. To be considered for publication, articles must showcase innovative and original advances in their field of study and be presented in a manner that is understandable to scientists from diverse backgrounds. However, the journal generally does not publish highly specialized research.