Ping Zheng , Enrique Audain , Henry Webel , Chengxin Dai , Joshua Klein , Marc-Phillip Hitz , Timo Sachsenberg , Mingze Bai , Yasset Perez-Riverol
{"title":"Ibaqpy: A scalable Python package for baseline quantification in proteomics leveraging SDRF metadata","authors":"Ping Zheng , Enrique Audain , Henry Webel , Chengxin Dai , Joshua Klein , Marc-Phillip Hitz , Timo Sachsenberg , Mingze Bai , Yasset Perez-Riverol","doi":"10.1016/j.jprot.2025.105440","DOIUrl":null,"url":null,"abstract":"<div><div>Intensity-based absolute quantification (iBAQ) is essential in proteomics as it allows for the assessment of a protein's absolute abundance in various samples or conditions. However, the computation of these values for increasingly large-scale and high-throughput experiments, such as those using DIA, TMT, or LFQ workflows, poses significant challenges in scalability and reproducibility. Here, we present ibaqpy (<span><span>https://github.com/bigbio/ibaqpy</span><svg><path></path></svg></span>), a Python package designed to compute iBAQ values efficiently for experiments of any scale. Ibaqpy leverages the Sample and Data Relationship Format (SDRF) metadata standard to incorporate experimental metadata into the quantification workflow. This allows for automatic normalization and batch correction while accounting for key aspects of the experimental design, such as technical and biological replicates, fractionation strategies, and sample conditions. Designed for large-scale proteomics datasets, ibaqpy can also recompute iBAQ values for existing experiments when an SDRF is available. We showcased ibaqpy's capabilities by reanalyzing 17 public proteomics datasets from ProteomeXchange, covering HeLa cell lines with 4921 samples and 5766 MS runs, quantifying a total of 11,014 proteins. In our reanalysis, ibaqpy is a key component in automating reproducible quantification, reducing manual effort and making quantitative proteomics more accessible while supporting FAIR principles for data reuse.</div></div><div><h3>Significance</h3><div>Proteomics studies often rely on intensity-based absolute quantification (iBAQ) to assess protein abundance across various biological conditions. Despite its widespread use, computing iBAQ values at scale remains challenging due to the increasing complexity and volume of proteomics experiments. Existing tools frequently lack metadata integration, limiting their ability to handle experimental design intricacies such as replicates, fractions, and batch effects. Our work introduces ibaqpy, a scalable Python package that leverages the Sample and Data Relationship Format (SDRF) to compute iBAQ values efficiently while incorporating critical experimental metadata. By enabling automated normalization and batch correction, ibaqpy ensures reproducible and comparable quantification across large-scale datasets.</div><div>We validated the utility of ibaqpy through the reanalysis of 17 public HeLa datasets, comprising over 200 million peptide features and quantifying 11,000 proteins across thousands of samples. This comprehensive reanalysis highlights the robustness and scalability of ibaqpy, making it an essential tool for researchers conducting large-scale proteomics experiments. Moreover, by promoting FAIR principles for data reuse and interoperability, ibaqpy offers a transformative approach to baseline protein quantification, supporting reproducible research and data integration within the proteomics community.</div></div>","PeriodicalId":16891,"journal":{"name":"Journal of proteomics","volume":"317 ","pages":"Article 105440"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of proteomics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874391925000673","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Intensity-based absolute quantification (iBAQ) is essential in proteomics as it allows for the assessment of a protein's absolute abundance in various samples or conditions. However, the computation of these values for increasingly large-scale and high-throughput experiments, such as those using DIA, TMT, or LFQ workflows, poses significant challenges in scalability and reproducibility. Here, we present ibaqpy (https://github.com/bigbio/ibaqpy), a Python package designed to compute iBAQ values efficiently for experiments of any scale. Ibaqpy leverages the Sample and Data Relationship Format (SDRF) metadata standard to incorporate experimental metadata into the quantification workflow. This allows for automatic normalization and batch correction while accounting for key aspects of the experimental design, such as technical and biological replicates, fractionation strategies, and sample conditions. Designed for large-scale proteomics datasets, ibaqpy can also recompute iBAQ values for existing experiments when an SDRF is available. We showcased ibaqpy's capabilities by reanalyzing 17 public proteomics datasets from ProteomeXchange, covering HeLa cell lines with 4921 samples and 5766 MS runs, quantifying a total of 11,014 proteins. In our reanalysis, ibaqpy is a key component in automating reproducible quantification, reducing manual effort and making quantitative proteomics more accessible while supporting FAIR principles for data reuse.
Significance
Proteomics studies often rely on intensity-based absolute quantification (iBAQ) to assess protein abundance across various biological conditions. Despite its widespread use, computing iBAQ values at scale remains challenging due to the increasing complexity and volume of proteomics experiments. Existing tools frequently lack metadata integration, limiting their ability to handle experimental design intricacies such as replicates, fractions, and batch effects. Our work introduces ibaqpy, a scalable Python package that leverages the Sample and Data Relationship Format (SDRF) to compute iBAQ values efficiently while incorporating critical experimental metadata. By enabling automated normalization and batch correction, ibaqpy ensures reproducible and comparable quantification across large-scale datasets.
We validated the utility of ibaqpy through the reanalysis of 17 public HeLa datasets, comprising over 200 million peptide features and quantifying 11,000 proteins across thousands of samples. This comprehensive reanalysis highlights the robustness and scalability of ibaqpy, making it an essential tool for researchers conducting large-scale proteomics experiments. Moreover, by promoting FAIR principles for data reuse and interoperability, ibaqpy offers a transformative approach to baseline protein quantification, supporting reproducible research and data integration within the proteomics community.
期刊介绍:
Journal of Proteomics is aimed at protein scientists and analytical chemists in the field of proteomics, biomarker discovery, protein analytics, plant proteomics, microbial and animal proteomics, human studies, tissue imaging by mass spectrometry, non-conventional and non-model organism proteomics, and protein bioinformatics. The journal welcomes papers in new and upcoming areas such as metabolomics, genomics, systems biology, toxicogenomics, pharmacoproteomics.
Journal of Proteomics unifies both fundamental scientists and clinicians, and includes translational research. Suggestions for reviews, webinars and thematic issues are welcome.