Ibaqpy: A scalable Python package for baseline quantification in proteomics leveraging SDRF metadata

IF 2.8 2区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

Journal of proteomics Pub Date : 2025-04-21 DOI:10.1016/j.jprot.2025.105440

Ping Zheng , Enrique Audain , Henry Webel , Chengxin Dai , Joshua Klein , Marc-Phillip Hitz , Timo Sachsenberg , Mingze Bai , Yasset Perez-Riverol

{"title":"Ibaqpy: A scalable Python package for baseline quantification in proteomics leveraging SDRF metadata","authors":"Ping Zheng , Enrique Audain , Henry Webel , Chengxin Dai , Joshua Klein , Marc-Phillip Hitz , Timo Sachsenberg , Mingze Bai , Yasset Perez-Riverol","doi":"10.1016/j.jprot.2025.105440","DOIUrl":null,"url":null,"abstract":"<div><div>Intensity-based absolute quantification (iBAQ) is essential in proteomics as it allows for the assessment of a protein's absolute abundance in various samples or conditions. However, the computation of these values for increasingly large-scale and high-throughput experiments, such as those using DIA, TMT, or LFQ workflows, poses significant challenges in scalability and reproducibility. Here, we present ibaqpy (<span><span>https://github.com/bigbio/ibaqpy</span><svg><path></path></svg></span>), a Python package designed to compute iBAQ values efficiently for experiments of any scale. Ibaqpy leverages the Sample and Data Relationship Format (SDRF) metadata standard to incorporate experimental metadata into the quantification workflow. This allows for automatic normalization and batch correction while accounting for key aspects of the experimental design, such as technical and biological replicates, fractionation strategies, and sample conditions. Designed for large-scale proteomics datasets, ibaqpy can also recompute iBAQ values for existing experiments when an SDRF is available. We showcased ibaqpy's capabilities by reanalyzing 17 public proteomics datasets from ProteomeXchange, covering HeLa cell lines with 4921 samples and 5766 MS runs, quantifying a total of 11,014 proteins. In our reanalysis, ibaqpy is a key component in automating reproducible quantification, reducing manual effort and making quantitative proteomics more accessible while supporting FAIR principles for data reuse.</div></div><div><h3>Significance</h3><div>Proteomics studies often rely on intensity-based absolute quantification (iBAQ) to assess protein abundance across various biological conditions. Despite its widespread use, computing iBAQ values at scale remains challenging due to the increasing complexity and volume of proteomics experiments. Existing tools frequently lack metadata integration, limiting their ability to handle experimental design intricacies such as replicates, fractions, and batch effects. Our work introduces ibaqpy, a scalable Python package that leverages the Sample and Data Relationship Format (SDRF) to compute iBAQ values efficiently while incorporating critical experimental metadata. By enabling automated normalization and batch correction, ibaqpy ensures reproducible and comparable quantification across large-scale datasets.</div><div>We validated the utility of ibaqpy through the reanalysis of 17 public HeLa datasets, comprising over 200 million peptide features and quantifying 11,000 proteins across thousands of samples. This comprehensive reanalysis highlights the robustness and scalability of ibaqpy, making it an essential tool for researchers conducting large-scale proteomics experiments. Moreover, by promoting FAIR principles for data reuse and interoperability, ibaqpy offers a transformative approach to baseline protein quantification, supporting reproducible research and data integration within the proteomics community.</div></div>","PeriodicalId":16891,"journal":{"name":"Journal of proteomics","volume":"317 ","pages":"Article 105440"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of proteomics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874391925000673","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Intensity-based absolute quantification (iBAQ) is essential in proteomics as it allows for the assessment of a protein's absolute abundance in various samples or conditions. However, the computation of these values for increasingly large-scale and high-throughput experiments, such as those using DIA, TMT, or LFQ workflows, poses significant challenges in scalability and reproducibility. Here, we present ibaqpy (https://github.com/bigbio/ibaqpy), a Python package designed to compute iBAQ values efficiently for experiments of any scale. Ibaqpy leverages the Sample and Data Relationship Format (SDRF) metadata standard to incorporate experimental metadata into the quantification workflow. This allows for automatic normalization and batch correction while accounting for key aspects of the experimental design, such as technical and biological replicates, fractionation strategies, and sample conditions. Designed for large-scale proteomics datasets, ibaqpy can also recompute iBAQ values for existing experiments when an SDRF is available. We showcased ibaqpy's capabilities by reanalyzing 17 public proteomics datasets from ProteomeXchange, covering HeLa cell lines with 4921 samples and 5766 MS runs, quantifying a total of 11,014 proteins. In our reanalysis, ibaqpy is a key component in automating reproducible quantification, reducing manual effort and making quantitative proteomics more accessible while supporting FAIR principles for data reuse.

Significance

Proteomics studies often rely on intensity-based absolute quantification (iBAQ) to assess protein abundance across various biological conditions. Despite its widespread use, computing iBAQ values at scale remains challenging due to the increasing complexity and volume of proteomics experiments. Existing tools frequently lack metadata integration, limiting their ability to handle experimental design intricacies such as replicates, fractions, and batch effects. Our work introduces ibaqpy, a scalable Python package that leverages the Sample and Data Relationship Format (SDRF) to compute iBAQ values efficiently while incorporating critical experimental metadata. By enabling automated normalization and batch correction, ibaqpy ensures reproducible and comparable quantification across large-scale datasets.

We validated the utility of ibaqpy through the reanalysis of 17 public HeLa datasets, comprising over 200 million peptide features and quantifying 11,000 proteins across thousands of samples. This comprehensive reanalysis highlights the robustness and scalability of ibaqpy, making it an essential tool for researchers conducting large-scale proteomics experiments. Moreover, by promoting FAIR principles for data reuse and interoperability, ibaqpy offers a transformative approach to baseline protein quantification, supporting reproducible research and data integration within the proteomics community.

Abstract Image

查看原文本刊更多论文

Ibaqpy：一个可扩展的Python包，用于利用SDRF元数据进行蛋白质组学的基线量化

基于强度的绝对定量（iBAQ）在蛋白质组学中是必不可少的，因为它允许在各种样品或条件下评估蛋白质的绝对丰度。然而，对于越来越大规模和高通量的实验，如使用DIA、TMT或LFQ工作流的实验，这些值的计算在可扩展性和可重复性方面提出了重大挑战。在这里，我们介绍ibaqpy (https://github.com/bigbio/ibaqpy)，这是一个Python包，旨在为任何规模的实验有效地计算iBAQ值。Ibaqpy利用样本和数据关系格式（SDRF）元数据标准将实验元数据合并到量化工作流程中。这允许自动归一化和批量校正，同时考虑实验设计的关键方面，如技术和生物复制，分离策略和样品条件。ibaqpy是为大规模蛋白质组学数据集设计的，当SDRF可用时，它还可以重新计算现有实验的iBAQ值。通过重新分析来自ProteomeXchange的17个公共蛋白质组学数据集，我们展示了ibaqpy的能力，这些数据集涵盖了HeLa细胞系的4921个样本和5766个MS运行，共量化了11014种蛋白质。在我们的再分析中，ibaqpy是自动化可重复量化的关键组件，减少了人工工作量，使定量蛋白质组学更容易获得，同时支持FAIR原则进行数据重用。蛋白质组学研究通常依赖于基于强度的绝对定量（iBAQ）来评估不同生物条件下的蛋白质丰度。尽管广泛使用，但由于蛋白质组学实验的复杂性和体积的增加，大规模计算iBAQ值仍然具有挑战性。现有工具经常缺乏元数据集成，限制了它们处理实验设计复杂性（如复制、分数和批处理效果）的能力。我们的工作介绍了ibaqpy，一个可扩展的Python包，它利用样本和数据关系格式（SDRF）有效地计算iBAQ值，同时结合关键的实验元数据。通过启用自动规范化和批量校正，ibaqpy确保了大规模数据集的可重复性和可比性量化。我们通过重新分析17个公共HeLa数据集验证了ibaqpy的实用性，这些数据集包括超过2亿个肽特征，并在数千个样本中量化了11,000种蛋白质。这项全面的再分析突出了ibaqpy的稳健性和可扩展性，使其成为研究人员进行大规模蛋白质组学实验的重要工具。此外，通过促进数据重用和互操作性的FAIR原则，ibaqpy为基线蛋白质定量提供了一种变革性方法，支持蛋白质组学社区内的可重复研究和数据整合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of proteomics 生物-生化研究方法

CiteScore

7.10

自引率

3.00%

发文量

227

审稿时长

73 days

期刊介绍： Journal of Proteomics is aimed at protein scientists and analytical chemists in the field of proteomics, biomarker discovery, protein analytics, plant proteomics, microbial and animal proteomics, human studies, tissue imaging by mass spectrometry, non-conventional and non-model organism proteomics, and protein bioinformatics. The journal welcomes papers in new and upcoming areas such as metabolomics, genomics, systems biology, toxicogenomics, pharmacoproteomics. Journal of Proteomics unifies both fundamental scientists and clinicians, and includes translational research. Suggestions for reviews, webinars and thematic issues are welcome.