TopCysteineDB：一个半胱氨酸数据库，整合了半胱氨酸配位性预测的结构和化学蛋白质组学数据。

IF 4.7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Journal of Molecular Biology Pub Date : 2025-05-08 DOI:10.1016/j.jmb.2025.169196

Michele Bonus , Julian Greb , Jaimeen D. Majmudar , Markus Boehm , Magdalena Korczynska , Azadeh Nazemi , Alan M. Mathiowetz , Holger Gohlke

{"title":"TopCysteineDB：一个半胱氨酸数据库，整合了半胱氨酸配位性预测的结构和化学蛋白质组学数据。","authors":"Michele Bonus , Julian Greb , Jaimeen D. Majmudar , Markus Boehm , Magdalena Korczynska , Azadeh Nazemi , Alan M. Mathiowetz , Holger Gohlke","doi":"10.1016/j.jmb.2025.169196","DOIUrl":null,"url":null,"abstract":"<div><div>Development of targeted covalent inhibitors and covalent ligand-first approaches have emerged as a powerful strategy in drug design, with cysteines being attractive targets due to their nucleophilicity and relative scarcity. While structural biology and chemoproteomics approaches have generated extensive data on cysteine ligandability, these complementary data types remain largely disconnected. Here, we present <em>TopCysteineDB</em>, a comprehensive resource integrating structural information from the PDB with chemoproteomics data from activity-based protein profiling experiments. Analysis of the complete PDB yielded 264,234 unique cysteines, while the proteomics dataset encompasses 41,898 detectable cysteines across the human proteome. Using <em>TopCovPDB</em>, an automated classification pipeline complemented by manual curation, we identified 787 covalent cysteines and systematically categorized other functional roles, including metal-binding, cofactor-binding, and disulfide bonds. Mapping residue-wise structural information to sequence space enabled cross-referencing between structural and proteomics data, creating a unified view of cysteine ligandability. For <em>TopCySPAL</em>, a machine learning model was developed, integrating structural features and proteomics data, achieving strong predictive performance (AUROC: 0.964, AUPRC: 0.914) and robust generalization to novel cases. <em>TopCysteineDB</em> and <em>TopCySPAL</em> are freely accessible through a webinterface, <em>TopCysteineDBApp</em> (<span><span>https://topcysteinedb.hhu.de/</span><svg><path></path></svg></span>), designed to facilitate exploration of cysteine sites across the human proteome. The interface provides an interactive visualization featuring a color-coded mapping of chemoproteomics data onto cysteine site structures and the highlighting of identified peptide sequences. It offers customizable dataset downloads and ligandability predictions for user-provided structures. This resource advances targeted covalent inhibitor design by providing integrated access to previously dispersed data types and enabling systematic analysis and prediction of cysteine ligandability.</div></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"437 15","pages":"Article 169196"},"PeriodicalIF":4.7000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TopCysteineDB: A Cysteinome-wide Database Integrating Structural and Chemoproteomics Data for Cysteine Ligandability Prediction\",\"authors\":\"Michele Bonus , Julian Greb , Jaimeen D. Majmudar , Markus Boehm , Magdalena Korczynska , Azadeh Nazemi , Alan M. Mathiowetz , Holger Gohlke\",\"doi\":\"10.1016/j.jmb.2025.169196\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Development of targeted covalent inhibitors and covalent ligand-first approaches have emerged as a powerful strategy in drug design, with cysteines being attractive targets due to their nucleophilicity and relative scarcity. While structural biology and chemoproteomics approaches have generated extensive data on cysteine ligandability, these complementary data types remain largely disconnected. Here, we present <em>TopCysteineDB</em>, a comprehensive resource integrating structural information from the PDB with chemoproteomics data from activity-based protein profiling experiments. Analysis of the complete PDB yielded 264,234 unique cysteines, while the proteomics dataset encompasses 41,898 detectable cysteines across the human proteome. Using <em>TopCovPDB</em>, an automated classification pipeline complemented by manual curation, we identified 787 covalent cysteines and systematically categorized other functional roles, including metal-binding, cofactor-binding, and disulfide bonds. Mapping residue-wise structural information to sequence space enabled cross-referencing between structural and proteomics data, creating a unified view of cysteine ligandability. For <em>TopCySPAL</em>, a machine learning model was developed, integrating structural features and proteomics data, achieving strong predictive performance (AUROC: 0.964, AUPRC: 0.914) and robust generalization to novel cases. <em>TopCysteineDB</em> and <em>TopCySPAL</em> are freely accessible through a webinterface, <em>TopCysteineDBApp</em> (<span><span>https://topcysteinedb.hhu.de/</span><svg><path></path></svg></span>), designed to facilitate exploration of cysteine sites across the human proteome. The interface provides an interactive visualization featuring a color-coded mapping of chemoproteomics data onto cysteine site structures and the highlighting of identified peptide sequences. It offers customizable dataset downloads and ligandability predictions for user-provided structures. This resource advances targeted covalent inhibitor design by providing integrated access to previously dispersed data types and enabling systematic analysis and prediction of cysteine ligandability.</div></div>\",\"PeriodicalId\":369,\"journal\":{\"name\":\"Journal of Molecular Biology\",\"volume\":\"437 15\",\"pages\":\"Article 169196\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Molecular Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022283625002621\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022283625002621","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

靶向共价抑制剂和共价配体优先方法的开发已经成为药物设计中的一种强有力的策略，半胱氨酸由于其亲核性和相对稀缺性而成为有吸引力的靶标。虽然结构生物学和化学蛋白质组学方法已经产生了关于半胱氨酸配体性的大量数据，但这些互补的数据类型在很大程度上仍然是脱节的。在这里，我们介绍了TopCysteineDB，这是一个综合资源，整合了PDB的结构信息和基于活性的蛋白质谱实验的化学蛋白质组学数据。对完整PDB的分析产生了264234种独特的半胱氨酸，而蛋白质组学数据集包含了人类蛋白质组中41898种可检测的半胱氨酸。使用TopCovPDB，一个人工辅助的自动分类管道，我们确定了787个共价半胱氨酸，并系统地分类了其他功能角色，包括金属结合、辅因子结合和二硫键。将残基结构信息映射到序列空间，可以在结构和蛋白质组学数据之间进行交叉引用，从而创建半胱氨酸配位性的统一视图。对于TopCySPAL，开发了一个机器学习模型，整合了结构特征和蛋白质组学数据，实现了较强的预测性能（AUROC: 0.964, AUPRC: 0.914）和对新病例的鲁棒泛化。TopCysteineDB和TopCySPAL可通过网络界面TopCysteineDBApp （https://topcysteinedb.hhu.de/）免费获取，旨在促进对人类蛋白质组中半胱氨酸位点的探索。该界面提供了交互式可视化功能，具有半胱氨酸位点结构的化学蛋白质组学数据的颜色编码映射和已识别肽序列的高亮显示。它为用户提供的结构提供可定制的数据集下载和可配位性预测。该资源通过提供对先前分散的数据类型的集成访问以及对半胱氨酸配位性的系统分析和预测，推进了靶向共价抑制剂的设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

TopCysteineDB: A Cysteinome-wide Database Integrating Structural and Chemoproteomics Data for Cysteine Ligandability Prediction

查看原文本刊更多论文

TopCysteineDB: A Cysteinome-wide Database Integrating Structural and Chemoproteomics Data for Cysteine Ligandability Prediction

Development of targeted covalent inhibitors and covalent ligand-first approaches have emerged as a powerful strategy in drug design, with cysteines being attractive targets due to their nucleophilicity and relative scarcity. While structural biology and chemoproteomics approaches have generated extensive data on cysteine ligandability, these complementary data types remain largely disconnected. Here, we present TopCysteineDB, a comprehensive resource integrating structural information from the PDB with chemoproteomics data from activity-based protein profiling experiments. Analysis of the complete PDB yielded 264,234 unique cysteines, while the proteomics dataset encompasses 41,898 detectable cysteines across the human proteome. Using TopCovPDB, an automated classification pipeline complemented by manual curation, we identified 787 covalent cysteines and systematically categorized other functional roles, including metal-binding, cofactor-binding, and disulfide bonds. Mapping residue-wise structural information to sequence space enabled cross-referencing between structural and proteomics data, creating a unified view of cysteine ligandability. For TopCySPAL, a machine learning model was developed, integrating structural features and proteomics data, achieving strong predictive performance (AUROC: 0.964, AUPRC: 0.914) and robust generalization to novel cases. TopCysteineDB and TopCySPAL are freely accessible through a webinterface, TopCysteineDBApp (https://topcysteinedb.hhu.de/), designed to facilitate exploration of cysteine sites across the human proteome. The interface provides an interactive visualization featuring a color-coded mapping of chemoproteomics data onto cysteine site structures and the highlighting of identified peptide sequences. It offers customizable dataset downloads and ligandability predictions for user-provided structures. This resource advances targeted covalent inhibitor design by providing integrated access to previously dispersed data types and enabling systematic analysis and prediction of cysteine ligandability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Molecular Biology 生物-生化与分子生物学

CiteScore

11.30

自引率

1.80%

发文量

412

审稿时长

28 days

期刊介绍： Journal of Molecular Biology (JMB) provides high quality, comprehensive and broad coverage in all areas of molecular biology. The journal publishes original scientific research papers that provide mechanistic and functional insights and report a significant advance to the field. The journal encourages the submission of multidisciplinary studies that use complementary experimental and computational approaches to address challenging biological questions. Research areas include but are not limited to: Biomolecular interactions, signaling networks, systems biology; Cell cycle, cell growth, cell differentiation; Cell death, autophagy; Cell signaling and regulation; Chemical biology; Computational biology, in combination with experimental studies; DNA replication, repair, and recombination; Development, regenerative biology, mechanistic and functional studies of stem cells; Epigenetics, chromatin structure and function; Gene expression; Membrane processes, cell surface proteins and cell-cell interactions; Methodological advances, both experimental and theoretical, including databases; Microbiology, virology, and interactions with the host or environment; Microbiota mechanistic and functional studies; Nuclear organization; Post-translational modifications, proteomics; Processing and function of biologically important macromolecules and complexes; Molecular basis of disease; RNA processing, structure and functions of non-coding RNAs, transcription; Sorting, spatiotemporal organization, trafficking; Structural biology; Synthetic biology; Translation, protein folding, chaperones, protein degradation and quality control.