Dedenser:一个用于聚类和降采样化学库的Python包。

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL
Armen G Beck, Jonathan Fine, Yu-Hong Lam, Edward C Sherer, Erik L Regalado, Pankaj Aggarwal
{"title":"Dedenser:一个用于聚类和降采样化学库的Python包。","authors":"Armen G Beck, Jonathan Fine, Yu-Hong Lam, Edward C Sherer, Erik L Regalado, Pankaj Aggarwal","doi":"10.1021/acs.jcim.4c01980","DOIUrl":null,"url":null,"abstract":"<p><p>The screening of chemical libraries is an essential starting point in the drug discovery process. While some researchers desire a more thorough screening of drug targets against a narrower scope of molecules, it is not uncommon for diverse screening sets to be favored during the early stages of drug discovery. However, a cost burden is associated with the screening of molecules, with potential drawbacks if particular areas of chemical space are needlessly overrepresented. To facilitate triaged sampling of chemical libraries and other collections of molecules, we have developed Dedenser, a tool for the downsampling of chemical clusters. Dedenser functions by reducing the membership of clusters within chemical point clouds while maintaining the initial topology or distribution in chemical space. Dedenser is a Python package that utilizes Hierarchical Density-Based Spatial Clustering of Applications with Noise to first identify clusters present in 3D chemical point clouds and then downsamples by applying Poisson disk sampling to clusters based on either their volume or density in chemical space. A command line interface tool and graphic user interface are available with Dedenser, which allow for the generation of chemical point clouds, using Mordred for QSAR descriptor calculations and uniform manifold approximation and projection for 3D embedding, as well as visualization. We hope that Dedenser will serve the community by enabling quick access to reduced collections of molecules that are representative of larger sets and selecting even distributions of molecules within clusters rather than single representative molecules from clusters. All code for Dedenser is open source and available at https://github.com/MSDLLCpapers/dedenser.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1053-1060"},"PeriodicalIF":5.3000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dedenser: A Python Package for Clustering and Downsampling Chemical Libraries.\",\"authors\":\"Armen G Beck, Jonathan Fine, Yu-Hong Lam, Edward C Sherer, Erik L Regalado, Pankaj Aggarwal\",\"doi\":\"10.1021/acs.jcim.4c01980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The screening of chemical libraries is an essential starting point in the drug discovery process. While some researchers desire a more thorough screening of drug targets against a narrower scope of molecules, it is not uncommon for diverse screening sets to be favored during the early stages of drug discovery. However, a cost burden is associated with the screening of molecules, with potential drawbacks if particular areas of chemical space are needlessly overrepresented. To facilitate triaged sampling of chemical libraries and other collections of molecules, we have developed Dedenser, a tool for the downsampling of chemical clusters. Dedenser functions by reducing the membership of clusters within chemical point clouds while maintaining the initial topology or distribution in chemical space. Dedenser is a Python package that utilizes Hierarchical Density-Based Spatial Clustering of Applications with Noise to first identify clusters present in 3D chemical point clouds and then downsamples by applying Poisson disk sampling to clusters based on either their volume or density in chemical space. A command line interface tool and graphic user interface are available with Dedenser, which allow for the generation of chemical point clouds, using Mordred for QSAR descriptor calculations and uniform manifold approximation and projection for 3D embedding, as well as visualization. We hope that Dedenser will serve the community by enabling quick access to reduced collections of molecules that are representative of larger sets and selecting even distributions of molecules within clusters rather than single representative molecules from clusters. All code for Dedenser is open source and available at https://github.com/MSDLLCpapers/dedenser.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\" \",\"pages\":\"1053-1060\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jcim.4c01980\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01980","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

摘要

化学文库的筛选是药物发现过程中必不可少的起点。虽然一些研究人员希望针对更窄的分子范围对药物靶点进行更彻底的筛选,但在药物发现的早期阶段,不同的筛选集受到青睐并不罕见。然而,成本负担与分子筛选有关,如果化学空间的特定区域不必要地过度代表,则有潜在的缺点。为了方便化学文库和其他分子集合的分类采样,我们开发了Dedenser,一个化学簇的下采样工具。Dedenser函数通过减少化学点云内簇的成员,同时保持化学空间中的初始拓扑或分布。Dedenser是一个Python包,它利用基于噪声的分层密度空间聚类应用程序首先识别3D化学点云中的簇,然后根据它们在化学空间中的体积或密度对簇应用泊松盘采样来进行采样。Dedenser提供命令行界面工具和图形用户界面,允许生成化学点云,使用Mordred进行QSAR描述符计算,使用均匀流形近似和投影进行3D嵌入,以及可视化。我们希望Dedenser能够通过快速访问具有较大集合代表性的简化分子集合,并在集群中选择均匀分布的分子,而不是从集群中选择单个具有代表性的分子,从而为社区服务。Dedenser的所有代码都是开源的,可以在https://github.com/MSDLLCpapers/dedenser上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Dedenser: A Python Package for Clustering and Downsampling Chemical Libraries.

The screening of chemical libraries is an essential starting point in the drug discovery process. While some researchers desire a more thorough screening of drug targets against a narrower scope of molecules, it is not uncommon for diverse screening sets to be favored during the early stages of drug discovery. However, a cost burden is associated with the screening of molecules, with potential drawbacks if particular areas of chemical space are needlessly overrepresented. To facilitate triaged sampling of chemical libraries and other collections of molecules, we have developed Dedenser, a tool for the downsampling of chemical clusters. Dedenser functions by reducing the membership of clusters within chemical point clouds while maintaining the initial topology or distribution in chemical space. Dedenser is a Python package that utilizes Hierarchical Density-Based Spatial Clustering of Applications with Noise to first identify clusters present in 3D chemical point clouds and then downsamples by applying Poisson disk sampling to clusters based on either their volume or density in chemical space. A command line interface tool and graphic user interface are available with Dedenser, which allow for the generation of chemical point clouds, using Mordred for QSAR descriptor calculations and uniform manifold approximation and projection for 3D embedding, as well as visualization. We hope that Dedenser will serve the community by enabling quick access to reduced collections of molecules that are representative of larger sets and selecting even distributions of molecules within clusters rather than single representative molecules from clusters. All code for Dedenser is open source and available at https://github.com/MSDLLCpapers/dedenser.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信