Integration of diverse bioactivity data into the Chemical Checker compound universe.

IF 16 1区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Arnau Comajuncosa-Creus, Martino Bertoni, Miquel Duran-Frigola, Adrià Fernández-Torras, Oriol Guitart-Pla, Nils Kurzawa, Martina Locatelli, Yasmmin Martins, Elena Pareja-Lorente, Gema Rojas-Granado, Nicolas Soler, Eva Viesi, Patrick Aloy
{"title":"Integration of diverse bioactivity data into the Chemical Checker compound universe.","authors":"Arnau Comajuncosa-Creus, Martino Bertoni, Miquel Duran-Frigola, Adrià Fernández-Torras, Oriol Guitart-Pla, Nils Kurzawa, Martina Locatelli, Yasmmin Martins, Elena Pareja-Lorente, Gema Rojas-Granado, Nicolas Soler, Eva Viesi, Patrick Aloy","doi":"10.1038/s41596-025-01167-3","DOIUrl":null,"url":null,"abstract":"<p><p>Chemical signatures encode the physicochemical and structural properties of small molecules into numerical descriptors, forming the basis for chemical comparisons and search algorithms. The increasing availability of bioactivity data has improved compound representations to include biological effects (for example, induced gene expression changes), although bioactivity descriptors are often limited to a few well-documented molecules. To address this issue, we implemented a collection of deep neural networks able to leverage the experimentally determined bioactivity data associated to small molecules and infer the missing bioactivity signatures for any compound of interest. However, unlike static chemical descriptors, these bioactivity signatures dynamically evolve with new data and processing strategies. Here we present a computational protocol to modify or generate novel bioactivity spaces and signatures, describing the main steps needed to leverage diverse bioactivity data with the current knowledge, as catalogued in the Chemical Checker (CC; https://chemicalchecker.org/ ), using the predefined data curation pipeline. We illustrate the functioning of the protocol through four specific examples, including the incorporation of new compounds to an already existing bioactivity space, a change in the data preprocessing without altering the underlying experimental data and the creation of two novel bioactivity spaces from scratch, which are completed in under 9 h using graphics processing unit computing. Overall, this protocol offers a guideline for installing, testing and running the CC data integration approach on user-provided data, extending the annotation presented for a limited number of small molecules to a larger chemical landscape and generating novel bioactivity signatures.</p>","PeriodicalId":18901,"journal":{"name":"Nature Protocols","volume":" ","pages":""},"PeriodicalIF":16.0000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Protocols","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41596-025-01167-3","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Chemical signatures encode the physicochemical and structural properties of small molecules into numerical descriptors, forming the basis for chemical comparisons and search algorithms. The increasing availability of bioactivity data has improved compound representations to include biological effects (for example, induced gene expression changes), although bioactivity descriptors are often limited to a few well-documented molecules. To address this issue, we implemented a collection of deep neural networks able to leverage the experimentally determined bioactivity data associated to small molecules and infer the missing bioactivity signatures for any compound of interest. However, unlike static chemical descriptors, these bioactivity signatures dynamically evolve with new data and processing strategies. Here we present a computational protocol to modify or generate novel bioactivity spaces and signatures, describing the main steps needed to leverage diverse bioactivity data with the current knowledge, as catalogued in the Chemical Checker (CC; https://chemicalchecker.org/ ), using the predefined data curation pipeline. We illustrate the functioning of the protocol through four specific examples, including the incorporation of new compounds to an already existing bioactivity space, a change in the data preprocessing without altering the underlying experimental data and the creation of two novel bioactivity spaces from scratch, which are completed in under 9 h using graphics processing unit computing. Overall, this protocol offers a guideline for installing, testing and running the CC data integration approach on user-provided data, extending the annotation presented for a limited number of small molecules to a larger chemical landscape and generating novel bioactivity signatures.

将多种生物活性数据整合到化学检查器化合物宇宙中。
化学特征将小分子的物理化学和结构特性编码成数字描述符,形成化学比较和搜索算法的基础。尽管生物活性描述符通常局限于一些记录良好的分子,但越来越多的生物活性数据的可用性已经改进了化合物表征,以包括生物效应(例如,诱导的基因表达变化)。为了解决这个问题,我们实现了一组深度神经网络,能够利用实验确定的与小分子相关的生物活性数据,并推断任何感兴趣的化合物缺失的生物活性特征。然而,与静态化学描述符不同,这些生物活性特征随着新的数据和处理策略而动态演变。在这里,我们提出了一个计算协议来修改或生成新的生物活性空间和签名,描述了利用当前知识利用不同生物活性数据所需的主要步骤,如化学检查器(CC;https://chemicalchecker.org/),使用预定义的数据管理管道。我们通过四个具体的例子来说明该方案的功能,包括将新化合物合并到已经存在的生物活性空间,在不改变基础实验数据的情况下改变数据预处理,以及从头开始创建两个新的生物活性空间,这些都是使用图形处理单元计算在9小时内完成的。总的来说,该协议为在用户提供的数据上安装、测试和运行CC数据集成方法提供了指导,将有限数量的小分子的注释扩展到更大的化学景观,并生成新的生物活性特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Protocols
Nature Protocols 生物-生化研究方法
CiteScore
29.10
自引率
0.70%
发文量
128
审稿时长
4 months
期刊介绍: Nature Protocols focuses on publishing protocols used to address significant biological and biomedical science research questions, including methods grounded in physics and chemistry with practical applications to biological problems. The journal caters to a primary audience of research scientists and, as such, exclusively publishes protocols with research applications. Protocols primarily aimed at influencing patient management and treatment decisions are not featured. The specific techniques covered encompass a wide range, including but not limited to: Biochemistry, Cell biology, Cell culture, Chemical modification, Computational biology, Developmental biology, Epigenomics, Genetic analysis, Genetic modification, Genomics, Imaging, Immunology, Isolation, purification, and separation, Lipidomics, Metabolomics, Microbiology, Model organisms, Nanotechnology, Neuroscience, Nucleic-acid-based molecular biology, Pharmacology, Plant biology, Protein analysis, Proteomics, Spectroscopy, Structural biology, Synthetic chemistry, Tissue culture, Toxicology, and Virology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信