Jiahua Tan, , , Gian L. Negri, , , Gregg B. Morin*, , and , David D. Y. Chen*,
{"title":"Attribute-Weighted Aggregation of Tandem Mass Reporter Ion Intensity for Protein Quantification Using Isobaric Labeling","authors":"Jiahua Tan, , , Gian L. Negri, , , Gregg B. Morin*, , and , David D. Y. Chen*, ","doi":"10.1021/acs.jproteome.4c00992","DOIUrl":null,"url":null,"abstract":"<p >Isobaric labeling is a commonly used technique in proteomics. In bottom-up proteomics, protein abundance estimation requires combining reporter ion intensities from the corresponding peptide-spectrum matches (PSMs), a process referred to as aggregation. It is usually assumed that PSMs in this step represent protein abundance equally, but the differences in ionizability and propensity for isolation interference result in different levels of quantitative accuracy for PSMs. This work developed an attribute-weighted aggregation (AWA) method that considers PSM attributes with reporter ion intensities to provide a more accurate estimate of protein abundance. A random forest model was trained on the characteristics of PSMs using three spike-in data sets and used to predict the quantitative inaccuracy of PSMs to be aggregated based on their attributes. These PSMs were aggregated to the protein level based on the predicted inaccuracy. AWA was evaluated using the three spike-in data sets and applied to two large cancer cohorts. The results showed that applying AWA to different data sets can lead to better recall in differential expression analyses while maintaining high precision. To facilitate the application of AWA, an R package AWAggregator was developed, which also offers functions to retrain the random forest model for additional or alternative spike-in data sets.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":"24 10","pages":"4875–4887"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00992","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Isobaric labeling is a commonly used technique in proteomics. In bottom-up proteomics, protein abundance estimation requires combining reporter ion intensities from the corresponding peptide-spectrum matches (PSMs), a process referred to as aggregation. It is usually assumed that PSMs in this step represent protein abundance equally, but the differences in ionizability and propensity for isolation interference result in different levels of quantitative accuracy for PSMs. This work developed an attribute-weighted aggregation (AWA) method that considers PSM attributes with reporter ion intensities to provide a more accurate estimate of protein abundance. A random forest model was trained on the characteristics of PSMs using three spike-in data sets and used to predict the quantitative inaccuracy of PSMs to be aggregated based on their attributes. These PSMs were aggregated to the protein level based on the predicted inaccuracy. AWA was evaluated using the three spike-in data sets and applied to two large cancer cohorts. The results showed that applying AWA to different data sets can lead to better recall in differential expression analyses while maintaining high precision. To facilitate the application of AWA, an R package AWAggregator was developed, which also offers functions to retrain the random forest model for additional or alternative spike-in data sets.
期刊介绍:
Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".