{"title":"From explainable artificial intelligence to human understanding","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2025.100131","DOIUrl":"10.1016/j.ailsci.2025.100131","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100131"},"PeriodicalIF":0.0,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Helen Lai , Christos Kannas , Alan Kai Hassen , Emma Granqvist , Annie M. Westerlund , Djork-Arné Clevert , Mike Preuss , Samuel Genheden
{"title":"Multi-objective synthesis planning by means of Monte Carlo Tree search","authors":"Helen Lai , Christos Kannas , Alan Kai Hassen , Emma Granqvist , Annie M. Westerlund , Djork-Arné Clevert , Mike Preuss , Samuel Genheden","doi":"10.1016/j.ailsci.2025.100130","DOIUrl":"10.1016/j.ailsci.2025.100130","url":null,"abstract":"<div><div>We introduce a multi-objective search algorithm for retrosynthesis planning, based on a Monte Carlo Tree search formalism. The multi-objective search allows for combining diverse set of objectives without considering their scale or weighting factors. To benchmark this novel algorithm, we employ four objectives in a total of eight retrosynthesis experiments on a PaRoutes benchmark set. The objectives range from simple ones based on starting material and step count to complex ones based on synthesis complexity and route similarity. We show that with the careful employment of complex objectives, the multi-objective algorithm can outperform the single-objective search and provides a more diverse set of solutions. However, for many target compounds, the single- and multi-objective settings are equivalent. Nevertheless, our algorithm provides a framework for incorporating novel objectives for specific applications in synthesis planning.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100130"},"PeriodicalIF":0.0,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emma Svensson , Hannah Rosa Friesacher , Susanne Winiwarter , Lewis Mervin , Adam Arany , Ola Engkvist
{"title":"Enhancing uncertainty quantification in drug discovery with censored regression labels","authors":"Emma Svensson , Hannah Rosa Friesacher , Susanne Winiwarter , Lewis Mervin , Adam Arany , Ola Engkvist","doi":"10.1016/j.ailsci.2025.100128","DOIUrl":"10.1016/j.ailsci.2025.100128","url":null,"abstract":"<div><div>In the early stages of drug discovery, decisions regarding which experiments to pursue can be influenced by computational models for quantitative structure–activity relationships (QSAR). These decisions are critical due to the time-consuming and expensive nature of the experiments. Therefore, it is becoming essential to accurately quantify the uncertainty in machine learning predictions, such that resources can be used optimally and trust in the models improves. While computational methods for QSAR modeling often suffer from limited data and sparse experimental observations, additional information can exist in the form of censored labels that provide thresholds rather than precise values of observations. However, the standard approaches that quantify uncertainty in machine learning cannot fully utilize censored labels. In this work, we adapt ensemble-based, Bayesian, and Gaussian models with tools to learn from censored labels by using the Tobit model from survival analysis. Our results demonstrate that despite the partial information available in censored labels, they are essential to reliably estimate uncertainties in real pharmaceutical settings where approximately one-third or more of experimental labels are censored.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100128"},"PeriodicalIF":0.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143429573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mario Astigarraga , Andrés Sánchez-Ruiz , Gonzalo Colmenarejo
{"title":"Conformal prediction-based machine learning in Cheminformatics: Current applications and new challenges","authors":"Mario Astigarraga , Andrés Sánchez-Ruiz , Gonzalo Colmenarejo","doi":"10.1016/j.ailsci.2025.100127","DOIUrl":"10.1016/j.ailsci.2025.100127","url":null,"abstract":"<div><div>Conformal Prediction (CP) is a distribution-free Machine Learning (ML) framework that has been developed in the last ∼25 years to provide well calibrated prediction subsets/intervals that include the true label with a user pre-defined probability, only requiring data exchangeability. It is based on the concept of <em>nonconformity</em> (or dissimilarity) of the new prediction compared to previous data and their predictions, so that the prediction subset/interval size is larger for new “unusual” instances and smaller for “typical” instances. Given its simplicity and ease of applicability, since 2012 it has been widely adopted in Cheminformatics, especially in the Quantitative Structure-Activity Relationship (QSAR) modeling and Molecular Screening areas. This rapid popularization of CP in Cheminformatics can be explained on the grounds that: (a) it can handle the applicability domain (AD) issue of ML models, of large importance in Cheminformatics due to the immense size of the chemical space; (b) it deals with classification of heavily imbalanced datasets typical in Molecular Screening; and (c) it quantifies compound-specific prediction uncertainties, especially useful as it allows to implement gain-cost strategies to accelerate drug discovery by reducing compounds to test. This comprehensive review introduces the method, provides a full appraisal of the work done in the field of Cheminformatics (with special emphasis in the QSAR and Molecular Screening arenas), and discusses its pros and cons and new challenges, especially for Deep Learning applications and nonexchangeable datasets, a very frequent situation in Cheminformatics.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100127"},"PeriodicalIF":0.0,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143402856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas N. Alberca , Denis N. Prada Gori , Maximiliano J. Fallico , Alexandre V. Fassio , Alan Talevi , Carolina L. Bellera
{"title":"LIDEB's Useful Decoys (LUDe): A freely available decoy-generation tool. Benchmarking and scope","authors":"Lucas N. Alberca , Denis N. Prada Gori , Maximiliano J. Fallico , Alexandre V. Fassio , Alan Talevi , Carolina L. Bellera","doi":"10.1016/j.ailsci.2025.100129","DOIUrl":"10.1016/j.ailsci.2025.100129","url":null,"abstract":"<div><div>In the field of chemoinformatics, and in particular, when developing models to be applied in virtual screening campaigns, it is essential to run retrospective virtual screening experiments that evaluate the performance of such models in a scenario similar to the real one. That is, the ability to recover a small number of active compounds dispersed among a much larger number of compounds without the desired activity. However, such a retrospective experiment is often limited by the relative scarcity of known inactive compounds against the pharmacological target of interest. In these cases, automatic decoy (putative inactive compound) generation tools are often of great importance. Their basic goal is to generate decoys that are similar enough to the known active compounds to challenge the models, but different enough so that the probability that the decoys modulate the molecular target of interest is small.</div><div>In this article, we report the latest version of our open-source decoy generation tool LUDe, inspired by the well-known DUD-E but designed to reduce the probability of generating decoys topologically similar to known active compounds. We have carried out a benchmarking exercise against DUD-E through 102 pharmacological targets, using the DOE score and the Doppelganger score as comparison criteria. LUDe decoys obtained better DOE scores across most of the targets, indicating a lower risk of artificial enrichment. The mean Doppelganger score, in contrast, was similar for LUDe and DUD-E decoys, exhibiting a slight improvement for LUDe decoys for most of the targets. Simulation experiments were performed to verify whether the generated decoys are unsuitable to validate ligand-based models. Our results suggest that LUDe decoys are apt to be used to validate and compare machine learning ligand-based screening approaches. Importantly, LUDe may be used locally, independently from external server availability, and is thus suitable to obtain decoys from large datasets. It is available as a Web App (at <span><span>https://lideb.biol.unlp.edu.ar/?page_id=1076</span><svg><path></path></svg></span>) and as Python code at (<span><span>https://github.com/LIDeB/LUDe.v1.0</span><svg><path></path></svg></span>)</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100129"},"PeriodicalIF":0.0,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Koen Bruynseels , Lotte Asveld , Jeroen van den Hoven
{"title":"“Foundation models for research: A matter of trust?”","authors":"Koen Bruynseels , Lotte Asveld , Jeroen van den Hoven","doi":"10.1016/j.ailsci.2025.100126","DOIUrl":"10.1016/j.ailsci.2025.100126","url":null,"abstract":"<div><div>Science would not be possible without trust among experts, trust of the public in experts, and reliance on scientific instruments and methods. The rapid adoption of scientific foundation models and their use in AI agents is changing scientific practices and thereby impacting this epistemic fabric which hinges on trust and reliance. Foundation models are machine learning models that are trained on large bodies of data and can be applied to a multitude of tasks. Their application in science raises the question of whether scientific foundation models can be relied upon as a research tool and to what extent, or even be trusted as if they were research partners.</div><div>Conceptual clarification of the notions of trust and reliance in science is pivotal in the face of foundation models. Trust and reliance form the glue for the increasingly distributed epistemic labour within contemporary technoscientific systems. We build on two concepts of trust in science, namely trust in science as shared values, and trust in science based on commitments to processes that provide objective claims. We analyse whether scientific foundation models are research tools to which the concept of reliance applies, or research partners that can be trustworthy or not. We consider these foundation models within their socio-technical contexts.</div><div>Allocation of trust should be reserved for human agents and the organizations they operate in. Reliance applies to foundation models and artificial intelligence agents. This distinction is important to unambiguously allocate responsibility, which is crucial in maintaining the fabric of trust that underpins science.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100126"},"PeriodicalIF":0.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J Sardell, S Das, K Taylor, C Stubberfield, A Malinowski, M Strivens, S Gardner
{"title":"Actively protective combinatorial analysis: A scalable novel method for detecting variants that contribute to reduced disease prevalence in high-risk individuals","authors":"J Sardell, S Das, K Taylor, C Stubberfield, A Malinowski, M Strivens, S Gardner","doi":"10.1016/j.ailsci.2025.100125","DOIUrl":"10.1016/j.ailsci.2025.100125","url":null,"abstract":"<div><div>We present a novel method for routinely identifying disease resilience associations that offers powerful insights for the discovery of a new class of disease protective targets. We show how this can be used to identify mechanisms in the background of normal cellular biology that work to slow or stop progression of complex, chronic diseases.</div><div>Actively protective combinatorial analysis identifies combinations of features that contribute to reducing risk of disease in individuals who remain healthy even though their genomic profile suggests that they have high risk of developing disease. These protective signatures can potentially be used to identify novel drug targets, pharmacogenomic and/or therapeutic mRNA opportunities and to better stratify patients by overall disease risk and mechanistic subtype.</div><div>We describe the method and illustrate how it offers increased power for detecting disease-associated genetic variants relative to traditional methods. We exemplify this by identifying individuals who remain healthy despite possessing several disease signatures associated with increased risk of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) or amyotrophic lateral sclerosis (ALS). We then identify combinations of SNP-genotypes significantly associated with reduced disease prevalence in these high-risk protected cohorts.</div><div>We discuss how actively protective combinatorial analysis generates novel insights into the genetic drivers of established disease biology and detects gene-disease associations missed by standard statistical approaches such as meta-GWAS. The results support the mechanism of action hypotheses identified in our original causative disease analyses. They also illustrate the potential for development of precision medicine approaches that can increase healthspan by reducing the progression of disease.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100125"},"PeriodicalIF":0.0,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clinical diagnostics and medical image analysis","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2024.100119","DOIUrl":"10.1016/j.ailsci.2024.100119","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100119"},"PeriodicalIF":0.0,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143578542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francis J. Prael III , Jutta Blank , William C. Forrester , Lingling Shen , Raquel Rodríguez-Pérez
{"title":"Explainable artificial intelligence for targeted protein degradation predictions","authors":"Francis J. Prael III , Jutta Blank , William C. Forrester , Lingling Shen , Raquel Rodríguez-Pérez","doi":"10.1016/j.ailsci.2024.100121","DOIUrl":"10.1016/j.ailsci.2024.100121","url":null,"abstract":"<div><div>Defining structure-activity relationships (SAR) is a central task in medicinal chemistry. Apart from optimizing activity against the target of interest, off-target activities and other properties need to be balanced to ensure a suitable property profile, which is an exceptional challenge in drug design. Machine learning (ML) can identify structural patterns in large compound collections that are correlated to biological activity or other molecular properties. Such ML-based SAR modeling has the potential of greatly assisting in compound optimization. However, the black-box character of most ML models has limited their application to help establishing SAR hypotheses. Explainable ML or, more generally, explainable artificial intelligence (XAI) aims at “opening the black box” by estimating how model inputs – e.g., chemical structures – contribute to model predictions. Although a variety of model interpretation methods have been proposed, XAI for medicinal chemistry is still an active field of research and XAI strategies are dominated by proofs of concept rather than by practical applications in drug discovery programs. Moreover, with the advent of new modalities, the applicability of ML and XAI models remains under-investigated. Herein, we present a novel application of XAI methods to targeted protein degradation (TPD) predictions. We report a case study of ML-based SAR modeling with explainable predictions of Cereblon (CRBN) glues for GSPT1 (G1 to S phase transition 1 protein). We showcase how XAI results were able to mirror expert knowledge based on structural data. Importantly, quantitative evaluations showed the ability of our ML/XAI workflow to accurately describe TPD activity cliffs across different proteins. These findings support use of the proposed XAI strategy to help rationalizing model predictions and illustrates how XAI methods can be exploited to balance SAR across different targets or properties for the new modality of TPDs.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100121"},"PeriodicalIF":0.0,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Publication of research using proprietary data","authors":"Raquel Rodríguez-Pérez , Jürgen Bajorath","doi":"10.1016/j.ailsci.2024.100120","DOIUrl":"10.1016/j.ailsci.2024.100120","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"7 ","pages":"Article 100120"},"PeriodicalIF":0.0,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}