Arnau Comajuncosa-Creus, Martino Bertoni, Miquel Duran-Frigola, Adrià Fernández-Torras, Oriol Guitart-Pla, Nils Kurzawa, Martina Locatelli, Yasmmin Martins, Elena Pareja-Lorente, Gema Rojas-Granado, Nicolas Soler, Eva Viesi, Patrick Aloy
{"title":"Integration of diverse bioactivity data into the Chemical Checker compound universe.","authors":"Arnau Comajuncosa-Creus, Martino Bertoni, Miquel Duran-Frigola, Adrià Fernández-Torras, Oriol Guitart-Pla, Nils Kurzawa, Martina Locatelli, Yasmmin Martins, Elena Pareja-Lorente, Gema Rojas-Granado, Nicolas Soler, Eva Viesi, Patrick Aloy","doi":"10.1038/s41596-025-01167-3","DOIUrl":null,"url":null,"abstract":"<p><p>Chemical signatures encode the physicochemical and structural properties of small molecules into numerical descriptors, forming the basis for chemical comparisons and search algorithms. The increasing availability of bioactivity data has improved compound representations to include biological effects (for example, induced gene expression changes), although bioactivity descriptors are often limited to a few well-documented molecules. To address this issue, we implemented a collection of deep neural networks able to leverage the experimentally determined bioactivity data associated to small molecules and infer the missing bioactivity signatures for any compound of interest. However, unlike static chemical descriptors, these bioactivity signatures dynamically evolve with new data and processing strategies. Here we present a computational protocol to modify or generate novel bioactivity spaces and signatures, describing the main steps needed to leverage diverse bioactivity data with the current knowledge, as catalogued in the Chemical Checker (CC; https://chemicalchecker.org/ ), using the predefined data curation pipeline. We illustrate the functioning of the protocol through four specific examples, including the incorporation of new compounds to an already existing bioactivity space, a change in the data preprocessing without altering the underlying experimental data and the creation of two novel bioactivity spaces from scratch, which are completed in under 9 h using graphics processing unit computing. Overall, this protocol offers a guideline for installing, testing and running the CC data integration approach on user-provided data, extending the annotation presented for a limited number of small molecules to a larger chemical landscape and generating novel bioactivity signatures.</p>","PeriodicalId":18901,"journal":{"name":"Nature Protocols","volume":" ","pages":""},"PeriodicalIF":16.0000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Protocols","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41596-025-01167-3","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Chemical signatures encode the physicochemical and structural properties of small molecules into numerical descriptors, forming the basis for chemical comparisons and search algorithms. The increasing availability of bioactivity data has improved compound representations to include biological effects (for example, induced gene expression changes), although bioactivity descriptors are often limited to a few well-documented molecules. To address this issue, we implemented a collection of deep neural networks able to leverage the experimentally determined bioactivity data associated to small molecules and infer the missing bioactivity signatures for any compound of interest. However, unlike static chemical descriptors, these bioactivity signatures dynamically evolve with new data and processing strategies. Here we present a computational protocol to modify or generate novel bioactivity spaces and signatures, describing the main steps needed to leverage diverse bioactivity data with the current knowledge, as catalogued in the Chemical Checker (CC; https://chemicalchecker.org/ ), using the predefined data curation pipeline. We illustrate the functioning of the protocol through four specific examples, including the incorporation of new compounds to an already existing bioactivity space, a change in the data preprocessing without altering the underlying experimental data and the creation of two novel bioactivity spaces from scratch, which are completed in under 9 h using graphics processing unit computing. Overall, this protocol offers a guideline for installing, testing and running the CC data integration approach on user-provided data, extending the annotation presented for a limited number of small molecules to a larger chemical landscape and generating novel bioactivity signatures.
期刊介绍:
Nature Protocols focuses on publishing protocols used to address significant biological and biomedical science research questions, including methods grounded in physics and chemistry with practical applications to biological problems. The journal caters to a primary audience of research scientists and, as such, exclusively publishes protocols with research applications. Protocols primarily aimed at influencing patient management and treatment decisions are not featured.
The specific techniques covered encompass a wide range, including but not limited to: Biochemistry, Cell biology, Cell culture, Chemical modification, Computational biology, Developmental biology, Epigenomics, Genetic analysis, Genetic modification, Genomics, Imaging, Immunology, Isolation, purification, and separation, Lipidomics, Metabolomics, Microbiology, Model organisms, Nanotechnology, Neuroscience, Nucleic-acid-based molecular biology, Pharmacology, Plant biology, Protein analysis, Proteomics, Spectroscopy, Structural biology, Synthetic chemistry, Tissue culture, Toxicology, and Virology.