Attribute-Weighted Aggregation of Tandem Mass Reporter Ion Intensity for Protein Quantification Using Isobaric Labeling

IF 3.6 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Journal of Proteome Research Pub Date : 2025-09-23 DOI:10.1021/acs.jproteome.4c00992

Jiahua Tan, , , Gian L. Negri, , , Gregg B. Morin*, , and , David D. Y. Chen*,

{"title":"Attribute-Weighted Aggregation of Tandem Mass Reporter Ion Intensity for Protein Quantification Using Isobaric Labeling","authors":"Jiahua Tan, , , Gian L. Negri, , , Gregg B. Morin*, , and , David D. Y. Chen*, ","doi":"10.1021/acs.jproteome.4c00992","DOIUrl":null,"url":null,"abstract":"<p >Isobaric labeling is a commonly used technique in proteomics. In bottom-up proteomics, protein abundance estimation requires combining reporter ion intensities from the corresponding peptide-spectrum matches (PSMs), a process referred to as aggregation. It is usually assumed that PSMs in this step represent protein abundance equally, but the differences in ionizability and propensity for isolation interference result in different levels of quantitative accuracy for PSMs. This work developed an attribute-weighted aggregation (AWA) method that considers PSM attributes with reporter ion intensities to provide a more accurate estimate of protein abundance. A random forest model was trained on the characteristics of PSMs using three spike-in data sets and used to predict the quantitative inaccuracy of PSMs to be aggregated based on their attributes. These PSMs were aggregated to the protein level based on the predicted inaccuracy. AWA was evaluated using the three spike-in data sets and applied to two large cancer cohorts. The results showed that applying AWA to different data sets can lead to better recall in differential expression analyses while maintaining high precision. To facilitate the application of AWA, an R package AWAggregator was developed, which also offers functions to retrain the random forest model for additional or alternative spike-in data sets.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":"24 10","pages":"4875–4887"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00992","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Isobaric labeling is a commonly used technique in proteomics. In bottom-up proteomics, protein abundance estimation requires combining reporter ion intensities from the corresponding peptide-spectrum matches (PSMs), a process referred to as aggregation. It is usually assumed that PSMs in this step represent protein abundance equally, but the differences in ionizability and propensity for isolation interference result in different levels of quantitative accuracy for PSMs. This work developed an attribute-weighted aggregation (AWA) method that considers PSM attributes with reporter ion intensities to provide a more accurate estimate of protein abundance. A random forest model was trained on the characteristics of PSMs using three spike-in data sets and used to predict the quantitative inaccuracy of PSMs to be aggregated based on their attributes. These PSMs were aggregated to the protein level based on the predicted inaccuracy. AWA was evaluated using the three spike-in data sets and applied to two large cancer cohorts. The results showed that applying AWA to different data sets can lead to better recall in differential expression analyses while maintaining high precision. To facilitate the application of AWA, an R package AWAggregator was developed, which also offers functions to retrain the random forest model for additional or alternative spike-in data sets.

Abstract Image

查看原文本刊更多论文

等压标记用于蛋白质定量的串联质量报告离子强度属性加权聚集。

等压标记是蛋白质组学中常用的技术。在自下而上的蛋白质组学中，蛋白质丰度估计需要结合相应肽谱匹配（psm）的报告离子强度，这一过程被称为聚集。通常认为，这一步中的psm代表蛋白质丰度相等，但电离性和分离干扰倾向的差异导致psm的定量准确性水平不同。这项工作开发了一种属性加权聚合（AWA）方法，该方法考虑了具有报告离子强度的PSM属性，以提供更准确的蛋白质丰度估计。使用三个峰值数据集对psm特征进行随机森林模型训练，并根据psm的属性来预测psm的定量不准确性。根据预测的不准确性，将这些psm聚合到蛋白质水平。AWA使用三个峰值数据集进行评估，并应用于两个大型癌症队列。结果表明，在差异表达分析中，将AWA应用于不同的数据集可以在保持高精度的同时获得更好的查全率。为了方便AWA的应用，开发了一个R包AWAggregator，它还提供了针对附加或替代峰值数据集重新训练随机森林模型的功能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Proteome Research 生物-生化研究方法

CiteScore

9.00

自引率

4.50%

发文量

251

审稿时长

3 months

期刊介绍： Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".