Journal of Cheminformatics最新文献_第3页

Multiscale analysis and optimal glioma therapeutic candidate discovery using the CANDO platform. 使用CANDO平台进行多尺度分析和最佳胶质瘤候选治疗发现。

IF 5.7 2区化学

Journal of Cheminformatics Pub Date : 2026-04-12 DOI: 10.1186/s13321-026-01191-9

Sumei Xu, Yakun Hu, William Mangione, Melissa Van Norden, Katherine Elefteriou, Zackary Falls, Ram Samudrala

{"title":"Multiscale analysis and optimal glioma therapeutic candidate discovery using the CANDO platform.","authors":"Sumei Xu, Yakun Hu, William Mangione, Melissa Van Norden, Katherine Elefteriou, Zackary Falls, Ram Samudrala","doi":"10.1186/s13321-026-01191-9","DOIUrl":"https://doi.org/10.1186/s13321-026-01191-9","url":null,"abstract":"Glioma is a highly malignant brain tumor with limited treatment options. We employed the Computational Analysis of Novel Drug Opportunities (CANDO) platform for multiscale therapeutic discovery to predict new glioma therapies. We began by computing interaction scores between extensive libraries of drugs/compounds and proteins to generate \"interaction signatures\" that model compound behavior on a proteomic scale. Compounds with signatures most similar to those of drugs approved for a given indication were considered potential treatments. These compounds were further ranked by degree of consensus in corresponding similarity lists. We benchmarked performance by measuring the recovery of approved drugs in these similarity and consensus lists at various cutoffs, using multiple metrics and comparing results to random controls and performance across all indications. Compounds ranked highly by consensus but not previously associated with the indication of interest were considered new predictions. Our benchmarking results showed that CANDO improved accuracy in identifying glioma-associated drugs across all cutoffs compared to random controls. Our predictions, supported by literature-based analysis, identified 24 potential glioma treatments, including approved drugs like vitamin D, taxanes, vinca alkaloids, topoisomerase inhibitors, and folic acid, as well as investigational compounds such as ginsenosides, chrysin, resiniferatoxin, and cryptotanshinone. Further functional annotation-based analysis of the top targets with the strongest interactions to these predictions identified Vitamin D3 receptor, thyroid hormone receptor, acetylcholinesterase, cyclin-dependent kinase 2, tubulin alpha chain, dihydrofolate reductase, and thymidylate synthase. These findings indicate that CANDO's multitarget, multiscale framework is effective in identifying glioma drug candidates thereby informing new strategies for improving treatment.Scientific contribution (1) We present a robust, multiscale drug discovery framework that accurately recovers known glioma therapies and uncovers 24 novel candidates with strong literature and mechanistic support. (2) By modeling compound behavior across the proteome, our method pinpoints key targets-including VDR, CDK2, and DHFR-implicated in glioma biology. (3) This work positions CANDO as a powerful tool for rational repurposing and discovery of urgently needed treatments for aggressive brain tumors.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147669817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Is scaffold hopping possible in machine learning using the electronic-structure-informatics (ESI) descriptor set? an application to natural-product-based drug discovery of α-glucosidase inhibitors. 在使用电子结构信息学（ESI）描述符集的机器学习中，支架跳跃是可能的吗？α-葡萄糖苷酶抑制剂在天然产物药物发现中的应用。

IF 8.6 2区化学

Journal of Cheminformatics Pub Date : 2026-04-06 DOI: 10.1186/s13321-026-01192-8

Yusuke Tateishi,Manabu Sugimoto

{"title":"Is scaffold hopping possible in machine learning using the electronic-structure-informatics (ESI) descriptor set? an application to natural-product-based drug discovery of α-glucosidase inhibitors.","authors":"Yusuke Tateishi,Manabu Sugimoto","doi":"10.1186/s13321-026-01192-8","DOIUrl":"https://doi.org/10.1186/s13321-026-01192-8","url":null,"abstract":"The electronic-structure informatics (ESI) descriptor set was applied to discover novel α-glucosidase inhibitors from a natural product (NP) database. The in silico screening was carried out through regression modelling for inhibitory activity (pIC50) using XGBoost with the ESI descriptor set. The optimized model achieved a test R2 of 0.85, demonstrating its high predictive accuracy. To explore potent NPs for α-glucosidase inhibition, in silico screening of 2623 NPs was performed. Already known NP-α-glucosidase inhibitors such as theasinensin A, chebulagic acid, and casuarictin were \"re-identified\" through the screening. It also revealed structurally novel NP compounds with moderate inhibitory activity and new scaffolds different from those of the known inhibitors. A series of docking simulations on the newly discovered compounds revealed that their binding scores are higher than a marketed drug, acarbose. These results demonstrate the applicability and uniqueness of the ESI descriptor set in \"scaffold hopping\" using NP databases.Scientific contributionThis study shows that the electronic-structure informatics (ESI) descriptor set supports effective scaffold hopping for discovering α-glucosidase inhibitors from natural product (NP) libraries beyond conventional structure-based searches. By combining quantum-chemistry-derived ESI descriptors with machine learning, we identify structurally novel NP inhibitor candidates, which exhibit competitive predicted activity despite low similarity to known chemotypes. This work demonstrates the value of electronic-structure information in in silico screening for identifying chemically diverse candidates.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"19 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147625769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Constructing and characterizing trillion-scale combinatorial chemical library. 万亿级组合化学文库的构建与表征。

IF 8.6 2区化学

Journal of Cheminformatics Pub Date : 2026-04-04 DOI: 10.1186/s13321-026-01185-7

Jiaqi Su,Fu V Song,Dawei Huang,Maofu Liao

{"title":"Constructing and characterizing trillion-scale combinatorial chemical library.","authors":"Jiaqi Su,Fu V Song,Dawei Huang,Maofu Liao","doi":"10.1186/s13321-026-01185-7","DOIUrl":"https://doi.org/10.1186/s13321-026-01185-7","url":null,"abstract":"The scale of screening within ultra-large chemical spaces plays a pivotal role in contemporary drug discovery, particularly during the initial stages of hit identification. Construction of such chemical spaces to allow their efficient exploration for compound discovery is a critical challenge. In this research, we have generated a novel combinational chemical library, through integration of a curated reaction set with over 1.8 million available building blocks, resulting in a chemical space with more than one trillion molecules theoretically.To characterize the structural diversity, novelty, and physicochemical properties of the compounds in this trillion-scale library, a randomly sampled subset as a surrogate for exploration, rather than an exhaustive search algorithm, was employed. Our results demonstrate that, at both the fragment-level and molecule-level chemical spaces, the constructed library encompasses broad physicochemical diversity and rich scaffold novelty, overlapping with but also extending beyond natural product and FDA-approved chemical spaces. Scaffold retrieval analyses indicated near-ideal structural diversity at scale. Following virtual screening, the hit compounds need to be chemically synthesized, which is often resource demanding. Leveraging the unique characteristics of the combinatorial chemical space and employing a Quadratic Unconstrained Binary Optimization (QUBO) model, we have developed a strategy to maximize the utilization of building blocks for chemical synthesis to generate a larger number of molecules with desired properties (drug-likeness, natural product-likeness, and structure similarity). Together, our work has established a theoretically trillion-scale combinatorial chemical library which can facilitates efficient virtual screening and hit identification and further developed a novel method for resource-optimized chemical synthesis. SCIENTIFIC CONTRIBUTION: This study presents a synthetically accessible trillion-scale combinatorial chemical library constructed from a curated reaction set and over 1.8 million building blocks, providing a highly diverse and scalable resource for ultra-large virtual screening. In addition, we develop a Quadratic Unconstrained Binary Optimization (QUBO)-based strategy forresource-efficient compound synthesis, which maximizes building block utilization while generating molecules with desired properties. Together, this work establishes an integrated framework that bridges large-scale chemical space exploration with practical synthesis, enabling more efficient hit identification and downstream optimization in drug discovery.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"119 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2026-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147617356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI-assisted interpretation of Markush structures in pharmaceutical patents: a review of emerging tools, datasets, and challenges. 人工智能对药物专利中马库什结构的辅助解释：对新兴工具、数据集和挑战的回顾。

IF 5.7 2区化学

Journal of Cheminformatics Pub Date : 2026-04-03 DOI: 10.1186/s13321-026-01172-y

Jennifer M Umbles Hayes, Emmanuel O Olawode, Anietie Andy, Edmund Essah Ameyaw

{"title":"AI-assisted interpretation of Markush structures in pharmaceutical patents: a review of emerging tools, datasets, and challenges.","authors":"Jennifer M Umbles Hayes, Emmanuel O Olawode, Anietie Andy, Edmund Essah Ameyaw","doi":"10.1186/s13321-026-01172-y","DOIUrl":"https://doi.org/10.1186/s13321-026-01172-y","url":null,"abstract":"Automated interpretation of Markush structures widely used in pharmaceutical patents to claim large families of related compounds remains challenging due to non-machine-readable structure images, variable R-groups, dependency rules, scaffold diversity, and heterogeneous claim language. Challenges include attachment points and stereochemistry, nested/conditional dependencies, and inconsistent drafting conventions that hinder faithful enumeration. Early rule-based cheminformatics systems parsed claims and mapped Markush representations into searchable formats, but struggled with nested dependencies, cross-references, and multimodal (text + image) descriptions. More recently, artificial intelligence (AI) methods have been introduced including language-based tools, vision-based tools, and multimodal or hybrid tools. Language-based tools increasingly use large language models (LLMs) and natural language processing (NLP) capabilities to extract variable definitions, constraints, and dependency graphs from claim text; vision systems translate structure depictions into machine-readable formats (e.g., SMILES, CXSMILES); multimodal or hybrid pipelines align both for end-to-end interpretation. Emerging datasets support these efforts, though licensing, family-wise leakage, and standardized splits remain inconsistent. This narrative review synthesizes tools, datasets, and evaluation practices for AI-assisted Markush interpretation, identifies persistent failure modes, and maps open legal questions (sufficiency, enablement, enforceability). We outline priorities for the field; transparent benchmarks with family-aware splits, interpretable constraint handling, and workflows aligned with U.S. Patent Office practice, near-term use is decision support, not legal advice.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147615724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A pipeline for developing AI-driven models to predict molecular initiating events: a case study on neural tube defects. 开发人工智能驱动模型以预测分子启动事件的管道：神经管缺陷的案例研究。

IF 8.6 2区化学

Journal of Cheminformatics Pub Date : 2026-04-02 DOI: 10.1186/s13321-026-01177-7

Job H Berkhout,Merel Florian,Domenico Gadaleta,Aldert H Piersma,Harm J Heusinkveld

{"title":"A pipeline for developing AI-driven models to predict molecular initiating events: a case study on neural tube defects.","authors":"Job H Berkhout,Merel Florian,Domenico Gadaleta,Aldert H Piersma,Harm J Heusinkveld","doi":"10.1186/s13321-026-01177-7","DOIUrl":"https://doi.org/10.1186/s13321-026-01177-7","url":null,"abstract":"Adverse Outcome Pathways (AOPs) describe the sequence of molecular and cellular events that lead to toxicity. Each pathway begins with a Molecular Initiating Event (MIE) and ends in an Adverse Outcome. Early identification of chemical activity on MIE-relevant protein targets supports first-line toxicity assessment and helps researchers prioritize mechanisms for subsequent experimental investigation. Here we present an automated AI pipeline that converts raw ChEMBL bioactivity data into optimized deep learning models for MIE prediction. The pipeline builds on the Knowledge-Guided Pre-training of Graph Transformer (KPGT) framework, which represents chemical structures as knowledge-enriched molecular graphs. It integrates data curation, molecular graph generation, and model training and tuning. This integration enables users to construct target-specific prediction models in a seamless and reproducible way, starting from initial data and ending with deployable AI. We demonstrate its use in a neural tube defect (NTD) case study, where fine-tuned KPGT models outperformed traditional Support Vector Machine models with a radial basis function kernel (SVM-RBF) when predicting MIEs linked to developmental toxicity. The results highlight the potential of AI-driven toxicity modeling to accelerate AOP development, improve endpoint prioritization, and prioritize chemicals for experimental follow-up. By providing an end-to-end, data-to-model workflow, the pipeline lowers the technical barrier to using modern graph-based neural architectures in toxicology. It offers a reproducible route to deployable MIE prediction models that support AOP development, compound prioritization, and early-stage chemical safety evaluation.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"64 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2026-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147599497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel approach for enhancing the potency of kinase inhibitors using topological water networks. 一种利用拓扑水网络增强激酶抑制剂效力的新方法。

IF 8.6 2区化学

Journal of Cheminformatics Pub Date : 2026-03-31 DOI: 10.1186/s13321-026-01188-4

Re Gin Jeoung,Anand Balupuri,Nayoung Lim,Sungwook Choi,Nam Sook Kang

{"title":"A novel approach for enhancing the potency of kinase inhibitors using topological water networks.","authors":"Re Gin Jeoung,Anand Balupuri,Nayoung Lim,Sungwook Choi,Nam Sook Kang","doi":"10.1186/s13321-026-01188-4","DOIUrl":"https://doi.org/10.1186/s13321-026-01188-4","url":null,"abstract":"Optimizing the potency of kinase inhibitors remains a major challenge due to the structural conservation of kinase binding pockets. While several computational methods have been developed to address this issue, most overlook the crucial role of water molecules within the binding site. Our research aims to address this gap by examining topological water networks (TWNs) within the target protein. We identified specific TWN-derived patterns in kinase binding sites which align with known crystallographic kinase fragments. Our findings reveal the potential of TWNs to significantly improve the identification and optimal placement of promising fragments within protein binding sites. Here, we propose TWN-based fragment growing (TWN-FG) method that enhances kinase inhibitor potency by leveraging the topological characteristics of hydration networks. TWN-FG successfully explains structure-activity relationship (SAR) trends of known kinase inhibitors and has been applied to design and synthesize a potent mixed lineage kinase 1 (MLK1) inhibitor. The source code is available at https://github.com/RgJeoung/TWN-FG to support further research and application.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"239 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147583880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

graphpancake: a Python package for representing organic molecules as molecular graphs utilizing electronic structure theory. graphpancake：一个Python包，用于利用电子结构理论将有机分子表示为分子图。

IF 5.7 2区化学

Journal of Cheminformatics Pub Date : 2026-03-31 DOI: 10.1186/s13321-026-01182-w

Sneha Sil, Mark A Maskeri, Karl A Scheidt

{"title":"graphpancake: a Python package for representing organic molecules as molecular graphs utilizing electronic structure theory.","authors":"Sneha Sil, Mark A Maskeri, Karl A Scheidt","doi":"10.1186/s13321-026-01182-w","DOIUrl":"https://doi.org/10.1186/s13321-026-01182-w","url":null,"abstract":"Computational methods for predictive modeling have been increasingly utilized in the early stages of drug discovery to supplement high-throughput screening. The advent of highly efficient and complex machine learning architectures necessitates new methods of collating the plethora of topological, geometrical, and quantum chemical data from small molecule drug candidates to interface with these advanced cheminformatics methods. Graph-based molecular representations encoding atomic and bonding features allow physicochemical meaning to be leveraged by computational methods to predict quantitative structure-activity relationships (QSAR) with high precision and accuracy. We present graphpancake, an open-source Python package that translates quantum mechanical data from density functional theory (DFT) and post-Hartree Fock (HF) wavefunction calculations into molecular graphs of small organic molecules. graphpancake is a quantum chemical graph generator intended for cheminformatics pipelines featuring command line utility, hierarchical graph types with varying feature complexity, and thorough user documentation. To test graphpancake's utility, we have curated several datasets (regression and classification) from the literature. Using random forest and message passing neural network architectures, we demonstrate how graphpancake's quantum-enriched features can predict quantitative properties more accurately (R2 values 0.3-0.5 higher) compared to SMILES-generated features. Our code is available on GitHub (MIT license).","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147589221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contrastive representation learning and capsule networks enable accurate identification of ferroptosis-related proteins 对比表征学习和胶囊网络能够准确识别铁枯病相关蛋白。

IF 5.7 2区化学

Journal of Cheminformatics Pub Date : 2026-03-28 Epub Date: 2026-05-07 DOI: 10.1186/s13321-026-01183-9

Yiyang Zhao, Xingchen Liu, Peilin Xie, Jiahui Guan, Zhihao Zhao, Junwen Wang, Tzong-Yi Lee, Ying-Chih Chiang, Leyi Wei, Xiangrong Liu, Lantian Yao

{"title":"Contrastive representation learning and capsule networks enable accurate identification of ferroptosis-related proteins","authors":"Yiyang Zhao, Xingchen Liu, Peilin Xie, Jiahui Guan, Zhihao Zhao, Junwen Wang, Tzong-Yi Lee, Ying-Chih Chiang, Leyi Wei, Xiangrong Liu, Lantian Yao","doi":"10.1186/s13321-026-01183-9","DOIUrl":"10.1186/s13321-026-01183-9","url":null,"abstract":"Ferroptosis is a distinct iron-dependent form of regulated cell death that plays critical roles in cancer progression, neurodegenerative disorders, and immune regulation. Computational identification of ferroptosis-related proteins (FRPs) remains challenging due to the complex regulatory network of ferroptosis, the functional heterogeneity of FRPs, and the limited availability of experimentally validated data. Accurate and high-throughput prediction of FRPs is therefore urgently needed. To address these challenges, we propose FeroConCap, a novel deep learning framework that integrates fractal chaos game representation (FCGR) encoding, capsule networks, and supervised contrastive learning to capture hierarchical and spatial sequence dependencies associated with ferroptosis. The supervised contrastive strategy enhances intra-class compactness while increasing inter-class separability in the embedding space, leading to more robust and discriminative representations. Using a benchmark dataset of 2298 non-redundant protein sequences, FeroConCap achieved state-of-the-art performance, with an accuracy of 95.65% and an MCC of 0.915, exceeding the current method by 4.13% in accuracy and 0.084 in MCC. Comprehensive ablation studies and feature visualization analyses further confirm that both FCGR encoding and the capsule architecture substantially contribute to performance improvement over traditional handcrafted descriptors. To facilitate practical applications, a user-friendly web server has been developed for efficient and large-scale FRP prediction, freely available at https://ycclab.cuhk.edu.cn/FeroConCap.This study introduces FeroConCap, a novel deep learning framework that enhances the identification of ferroptosis-related proteins. By integrating FCGR encoding with Capsule Networks and supervised contrastive learning, the method effectively captures complex sequence patterns, outperforming existing methods like FRP-XGBoost. This work provides a user-friendly web server, offering a robust tool for high-throughput screening in ferroptosis research and drug discovery. ","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"18 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1186/s13321-026-01183-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147535799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction: Integrating artificial intelligence and manual curation to enhance bioassay annotations in ChEMBL 更正：整合人工智能和人工管理，以增强ChEMBL中的生物测定注释。

IF 5.7 2区化学

Journal of Cheminformatics Pub Date : 2026-03-27 DOI: 10.1186/s13321-026-01181-x

Ines Smit, Melissa F. Adasme, Emma Manners, Sybilla Corbett, Nicolas Bosc, Hoang-My-Anh Do, Andrew R. Leach, Noel M. O’Boyle, Barbara Zdrazil

引用次数: 0

Perspective on applicability of data-driven machine learning computational new approach methodologies for hazard identification in chemicals risk assessment 数据驱动机器学习计算新方法在化学品风险评估中危害识别的适用性展望。

IF 5.7 2区化学

Journal of Cheminformatics Pub Date : 2026-03-26 Epub Date: 2026-05-05 DOI: 10.1186/s13321-026-01184-8

Geven Piir, Sulev Sild, Olga Tcheremenskaia, Emma Di Consiglio, Jörgen Henriksson, Agnieszka Gajewicz-Skretna, Enrico Mombelli, Alessandra Roncaglioni, Uko Maran

{"title":"Perspective on applicability of data-driven machine learning computational new approach methodologies for hazard identification in chemicals risk assessment","authors":"Geven Piir, Sulev Sild, Olga Tcheremenskaia, Emma Di Consiglio, Jörgen Henriksson, Agnieszka Gajewicz-Skretna, Enrico Mombelli, Alessandra Roncaglioni, Uko Maran","doi":"10.1186/s13321-026-01184-8","DOIUrl":"10.1186/s13321-026-01184-8","url":null,"abstract":"<div>Machine Learning (ML) and Artificial Intelligence (AI) approaches have potential to make better-informed decisions in chemical hazard identification while reducing animal testing. Their application in the context of New Approach Methodologies (NAMs) for Hazard Identification in Chemicals Risk Assessment (CRA) is challenging due to the limited knowledge, lack of experience, and uncertainty related to the use of these approaches. Therefore, to facilitate ML and AI approaches' potential acceptance for regulatory use, better standardization, guidelines for transparent reporting, validation, and frameworks are needed to understand their accessibility, verifiability, and usefulness criteria for predictions. An extensive literature review on the availability of ML and AI based NAMs for chemical hazard identification was conducted, focusing primarily on human health endpoints: specific target organ toxicity (STOT), genotoxicity and carcinogenicity, endocrine disruption, skin sensitization, developmental and reproductive toxicity (DART), and repeated dose or chronic toxicity. Nearly 2300 scientific articles were reviewed, and 274 publications with ML-QSAR models revealed that 60.9% of the models described in the scientific literature turned out to be non-usable, 21.9% were potentially usable, and 17.2% were directly usable, i.e., had available software solutions. By endpoint, the skin sensitization is best covered with the ML-QSAR models, followed by endocrine disruption, genotoxicity, and carcinogenicity models. The most derived ML-QSAR models are tree-based models such as random forests, and analogues, followed by artificial neural networks and support vector machine models, with other models being used to a lesser extent. The literature analysis led to a framework that helps model users to identify potentially suitable models for use in a regulatory context. In addition, the framework could help model developers better understand the expectations of model users in a regulatory context and use the framework as a reference when publishing their models, ensuring greater transparency, alignment with regulatory needs, and facilitating future acceptance.</div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"18 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1186/s13321-026-01184-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147518706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0